mbox series

[RFC,00/97] Basic GuC submission support in the i915

Message ID 20210506191451.77768-1-matthew.brost@intel.com (mailing list archive)
Headers show
Series Basic GuC submission support in the i915 | expand

Message

Matthew Brost May 6, 2021, 7:13 p.m. UTC
Basic GuC submission support. This is the first bullet point in the
upstreaming plan covered in the following RFC [1].

At a very high level the GuC is a piece of firmware which sits between
the i915 and the GPU. It offloads some of the scheduling of contexts
from the i915 and programs the GPU to submit contexts. The i915
communicates with the GuC and the GuC communicates with the GPU.

GuC submission will be disabled by default on all current upstream
platforms behind a module parameter - enable_guc. A value of 3 will
enable submission and HuC loading via the GuC. GuC submission should
work on all gen11+ platforms assuming the GuC firmware is present.

This is a huge series and it is completely unrealistic to merge all of
these patches at once. Fortunately I believe we can break down the
series into different merges:

1. Merge Chris Wilson's patches. These have already been reviewed
upstream and I fully agree with these patches as a precursor to GuC
submission.

2. Update to GuC 60.1.2. These are largely Michal's patches.

3. Turn on GuC/HuC auto mode by default.

4. Additional patches needed to support GuC submission. This is any
patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
MMIO'

5. GuC submission support. Patches number 35+. These all don't have to
merge at once though as we don't actually allow GuC submission until the
last patch of this series.

[1] https://patchwork.freedesktop.org/patch/432206/?series=89840&rev=1

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Chris Wilson (3):
  drm/i915/gt: Move engine setup out of set_default_submission
  drm/i915/gt: Move submission_method into intel_gt
  drm/i915/gt: Move CS interrupt handler to the backend

Daniele Ceraolo Spurio (6):
  drm/i915/guc: skip disabling CTBs before sanitizing the GuC
  drm/i915/guc: use probe_error log for CT enablement failure
  drm/i915/guc: enable only the user interrupt when using GuC submission
  drm/i915/uc: turn on GuC/HuC auto mode by default
  drm/i915/guc: Use guc_class instead of engine_class in fw interface
  drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (13):
  drm/i915/guc: Support per context scheduling policies
  drm/i915/guc: Update firmware to v60.1.2
  drm/i915: Engine relative MMIO
  drm/i915/guc: Module load failure test for CT buffer creation
  drm/i915: Track 'serial' counts for virtual engines
  drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  drm/i915/guc: Don't complain about reset races
  drm/i915/guc: Enable GuC engine reset
  drm/i915/guc: Fix for error capture after full GPU reset with GuC
  drm/i915/guc: Hook GuC scheduling policies up
  drm/i915/guc: Connect reset modparam updates to GuC policy flags
  drm/i915/guc: Include scheduling policies in the debugfs state dump
  drm/i915/guc: Add golden context to GuC ADS

Matthew Brost (53):
  drm/i915: Introduce i915_sched_engine object
  drm/i915/guc: Improve error message for unsolicited CT response
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Add stall timer to non blocking CTB send function
  drm/i915/guc: Optimize CTB writes and reads
  drm/i915/guc: Increase size of CTB buffers
  drm/i915/guc: Add new GuC interface defines and structures
  drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
  drm/i915/guc: Add lrc descriptor context lookup array
  drm/i915/guc: Implement GuC submission tasklet
  drm/i915/guc: Add bypass tasklet submission path to GuC
  drm/i915/guc: Implement GuC context operations for new inteface
  drm/i915/guc: Insert fence on context when deregistering
  drm/i915/guc: Defer context unpin until scheduling is disabled
  drm/i915/guc: Disable engine barriers with GuC during unpin
  drm/i915/guc: Extend deregistration fence to schedule disable
  drm/i915: Disable preempt busywait when using GuC scheduling
  drm/i915/guc: Ensure request ordering via completion fences
  drm/i915/guc: Disable semaphores when using GuC scheduling
  drm/i915/guc: Ensure G2H response has space in buffer
  drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  drm/i915/guc: Update GuC debugfs to support new GuC
  drm/i915/guc: Add several request trace points
  drm/i915: Add intel_context tracing
  drm/i915/guc: GuC virtual engines
  drm/i915: Hold reference to intel_context over life of i915_request
  drm/i915/guc: Disable bonding extension with GuC submission
  drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  drm/i915/guc: Reset implementation for new GuC interface
  drm/i915: Reset GPU immediately if submission is disabled
  drm/i915/guc: Add disable interrupts to guc sanitize
  drm/i915/guc: Suspend/resume implementation for new interface
  drm/i915/guc: Handle context reset notification
  drm/i915/guc: Handle engine reset failure notification
  drm/i915/guc: Enable the timer expired interrupt for GuC
  drm/i915/guc: Capture error state on context reset
  drm/i915/guc: Don't call ring_is_idle in GuC submission
  drm/i915/guc: Implement banned contexts for GuC submission
  drm/i915/guc: Allow flexible number of context ids
  drm/i915/guc: Connect the number of guc_ids to debugfs
  drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted
  drm/i915/guc: Don't allow requests not ready to consume all guc_ids
  drm/i915/guc: Introduce guc_submit_engine object
  drm/i915/guc: Implement GuC priority management
  drm/i915/guc: Support request cancellation
  drm/i915/guc: Check return of __xa_store when registering a context
  drm/i915/guc: Non-static lrc descriptor registration buffer
  drm/i915/guc: Take GT PM ref when deregistering context
  drm/i915: Add GT PM delayed worker
  drm/i915/guc: Take engine PM when a context is pinned with GuC
    submission
  drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
  drm/i915/guc: Selftest for GuC flow control
  drm/i915/guc: Update GuC documentation

Michal Wajdeczko (21):
  drm/i915/guc: Keep strict GuC ABI definitions
  drm/i915/guc: Stop using fence/status from CTB descriptor
  drm/i915: Promote ptrdiff() to i915_utils.h
  drm/i915/guc: Only rely on own CTB size
  drm/i915/guc: Don't repeat CTB layout calculations
  drm/i915/guc: Replace CTB array with explicit members
  drm/i915/guc: Update sizes of CTB buffers
  drm/i915/guc: Relax CTB response timeout
  drm/i915/guc: Start protecting access to CTB descriptors
  drm/i915/guc: Stop using mutex while sending CTB messages
  drm/i915/guc: Don't receive all G2H messages in irq handler
  drm/i915/guc: Always copy CT message to new allocation
  drm/i915/guc: Introduce unified HXG messages
  drm/i915/guc: Update MMIO based communication
  drm/i915/guc: Update CTB response status
  drm/i915/guc: Add flag for mark broken CTB
  drm/i915/guc: New definition of the CTB descriptor
  drm/i915/guc: New definition of the CTB registration action
  drm/i915/guc: New CTB based communication
  drm/i915/guc: Kill guc_clients.ct_pool
  drm/i915/guc: Early initialization of GuC send registers

Rodrigo Vivi (1):
  drm/i915/guc: Remove sample_forcewake h2g action

 drivers/gpu/drm/i915/Makefile                 |    2 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   39 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |    4 +-
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
 drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   45 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   76 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   96 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  320 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   75 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   14 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.h     |    5 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   71 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |    6 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  693 +--
 .../drm/i915/gt/intel_execlists_submission.h  |   14 -
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |    5 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   23 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  100 +-
 drivers/gpu/drm/i915/gt/intel_gt_irq.h        |   23 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   14 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         |   13 +
 .../drm/i915/gt/intel_gt_pm_delayed_work.c    |   35 +
 .../drm/i915/gt/intel_gt_pm_delayed_work.h    |   24 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   23 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   10 +
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
 drivers/gpu/drm/i915/gt/intel_reset.c         |   58 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   73 +-
 drivers/gpu/drm/i915/gt/intel_rps.c           |    6 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
 .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |   58 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   58 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |    6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |    6 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |    2 +-
 .../drm/i915/gt/selftest_ring_submission.c    |    2 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  177 +
 .../gt/uc/abi/guc_communication_ctb_abi.h     |  192 +
 .../gt/uc/abi/guc_communication_mmio_abi.h    |   35 +
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   13 +
 .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  247 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  194 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  131 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  484 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 1088 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   49 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   56 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  377 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4037 +++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   20 +-
 .../i915/gt/uc/intel_guc_submission_types.h   |   55 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  116 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   25 +-
 .../i915/gt/uc/selftest_guc_flow_control.c    |  589 +++
 drivers/gpu/drm/i915/i915_active.c            |    3 +
 drivers/gpu/drm/i915/i915_debugfs.c           |    8 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
 drivers/gpu/drm/i915/i915_drv.h               |    2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |   28 +-
 drivers/gpu/drm/i915/i915_irq.c               |   10 +-
 drivers/gpu/drm/i915/i915_params.h            |    2 +-
 drivers/gpu/drm/i915/i915_perf.c              |   16 +-
 drivers/gpu/drm/i915/i915_reg.h               |    2 +
 drivers/gpu/drm/i915/i915_request.c           |  218 +-
 drivers/gpu/drm/i915/i915_request.h           |   37 +-
 drivers/gpu/drm/i915/i915_scheduler.c         |  188 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   74 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   74 +
 drivers/gpu/drm/i915/i915_trace.h             |  219 +-
 drivers/gpu/drm/i915/i915_utils.h             |    5 +
 drivers/gpu/drm/i915/i915_vma.h               |    5 -
 drivers/gpu/drm/i915/intel_wakeref.c          |    5 +
 drivers/gpu/drm/i915/intel_wakeref.h          |    1 +
 .../drm/i915/selftests/i915_live_selftests.h  |    1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
 .../i915/selftests/intel_scheduler_helpers.c  |  101 +
 .../i915/selftests/intel_scheduler_helpers.h  |   37 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
 include/uapi/drm/i915_drm.h                   |    9 +
 93 files changed, 8954 insertions(+), 2222 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

Comments

Martin Peres May 9, 2021, 5:12 p.m. UTC | #1
Hi,

On 06/05/2021 22:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.

May I ask what will GuC command submission do that execlist won't/can't 
do? And what would be the impact on users? Even forgetting the troubled 
history of GuC (instability, performance regression, poor level of user 
support, 6+ years of trying to upstream it...), adding this much code 
and doubling the amount of validation needed should come with a 
rationale making it feel worth it... and I am not seeing here. Would you 
mind providing the rationale behind this work?

> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.

What is the plan here when it comes to keeping support for execlist? I 
am afraid that landing GuC support in Linux is the first step towards 
killing the execlist, which would force users to use proprietary 
firmwares that even most Intel engineers have little influence over. 
Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling" 
which states "Disable semaphores when using GuC scheduling as semaphores 
are broken in the current GuC firmware." is anything to go by, it means 
that even Intel developers seem to prefer working around the GuC 
firmware, rather than fixing it.

In the same vein, I have another concern related to the impact of GuC on 
Linux's stable releases. Let's say that in 3 years, a new application 
triggers a bug in command submission inside the firmware. Given that the 
Linux community cannot patch the GuC firmware, how likely is it that 
Intel would release a new GuC version? That would not be necessarily 
such a big problem if newer versions of the GuC could easily be 
backported to this potentially-decade-old Linux version, but given that 
the GuC seems to have ABI-breaking changes on a monthly cadence (we are 
at major version 60 *already*? :o), I would say that it is 
highly-unlikely that it would not require potentially-extensive changes 
to i915 to make it work, making the fix almost impossible to land in the 
stable tree... Do you have a plan to mitigate this problem?

Patches like "drm/i915/guc: Disable bonding extension with GuC 
submission" also make me twitch, as this means the two command 
submission paths will not be functionally equivalent, and enabling GuC 
could thus introduce a user-visible regression (one app used to work, 
then stopped working). Could you add in the commit's message a proof 
that this would not end up being a user regression (in which case, why 
have this codepath to begin with?).

Finally, could you explain why IGT tests need to be modified to work the 
GuC [1], and how much of the code in this series is covered by 
existing/upcoming tests? I would expect a very solid set of tests to 
minimize the maintenance burden, and enable users to reproduce potential 
issues found in this new codepath (too many users run with enable_guc=3, 
as can be seen on Google[2]).

Looking forward to reading up about your plan, and the commitments Intel 
would put in place to make this feature something users should be 
looking forward to rather than fear.

Thanks,
Martin

[2] https://www.google.com/search?q=enable_guc%3D3

> 
> This is a huge series and it is completely unrealistic to merge all of
> these patches at once. Fortunately I believe we can break down the
> series into different merges:
> 
> 1. Merge Chris Wilson's patches. These have already been reviewed
> upstream and I fully agree with these patches as a precursor to GuC
> submission.
> 
> 2. Update to GuC 60.1.2. These are largely Michal's patches.
> 
> 3. Turn on GuC/HuC auto mode by default.
> 
> 4. Additional patches needed to support GuC submission. This is any
> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> MMIO'
> 
> 5. GuC submission support. Patches number 35+. These all don't have to
> merge at once though as we don't actually allow GuC submission until the
> last patch of this series.
> 
> [1] https://patchwork.freedesktop.org/patch/432206/?series=89840&rev=1
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Chris Wilson (3):
>    drm/i915/gt: Move engine setup out of set_default_submission
>    drm/i915/gt: Move submission_method into intel_gt
>    drm/i915/gt: Move CS interrupt handler to the backend
> 
> Daniele Ceraolo Spurio (6):
>    drm/i915/guc: skip disabling CTBs before sanitizing the GuC
>    drm/i915/guc: use probe_error log for CT enablement failure
>    drm/i915/guc: enable only the user interrupt when using GuC submission
>    drm/i915/uc: turn on GuC/HuC auto mode by default
>    drm/i915/guc: Use guc_class instead of engine_class in fw interface
>    drm/i915/guc: Unblock GuC submission on Gen11+
> 
> John Harrison (13):
>    drm/i915/guc: Support per context scheduling policies
>    drm/i915/guc: Update firmware to v60.1.2
>    drm/i915: Engine relative MMIO
>    drm/i915/guc: Module load failure test for CT buffer creation
>    drm/i915: Track 'serial' counts for virtual engines
>    drm/i915/guc: Provide mmio list to be saved/restored on engine reset
>    drm/i915/guc: Don't complain about reset races
>    drm/i915/guc: Enable GuC engine reset
>    drm/i915/guc: Fix for error capture after full GPU reset with GuC
>    drm/i915/guc: Hook GuC scheduling policies up
>    drm/i915/guc: Connect reset modparam updates to GuC policy flags
>    drm/i915/guc: Include scheduling policies in the debugfs state dump
>    drm/i915/guc: Add golden context to GuC ADS
> 
> Matthew Brost (53):
>    drm/i915: Introduce i915_sched_engine object
>    drm/i915/guc: Improve error message for unsolicited CT response
>    drm/i915/guc: Add non blocking CTB send function
>    drm/i915/guc: Add stall timer to non blocking CTB send function
>    drm/i915/guc: Optimize CTB writes and reads
>    drm/i915/guc: Increase size of CTB buffers
>    drm/i915/guc: Add new GuC interface defines and structures
>    drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
>    drm/i915/guc: Add lrc descriptor context lookup array
>    drm/i915/guc: Implement GuC submission tasklet
>    drm/i915/guc: Add bypass tasklet submission path to GuC
>    drm/i915/guc: Implement GuC context operations for new inteface
>    drm/i915/guc: Insert fence on context when deregistering
>    drm/i915/guc: Defer context unpin until scheduling is disabled
>    drm/i915/guc: Disable engine barriers with GuC during unpin
>    drm/i915/guc: Extend deregistration fence to schedule disable
>    drm/i915: Disable preempt busywait when using GuC scheduling
>    drm/i915/guc: Ensure request ordering via completion fences
>    drm/i915/guc: Disable semaphores when using GuC scheduling
>    drm/i915/guc: Ensure G2H response has space in buffer
>    drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
>    drm/i915/guc: Update GuC debugfs to support new GuC
>    drm/i915/guc: Add several request trace points
>    drm/i915: Add intel_context tracing
>    drm/i915/guc: GuC virtual engines
>    drm/i915: Hold reference to intel_context over life of i915_request
>    drm/i915/guc: Disable bonding extension with GuC submission
>    drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
>    drm/i915/guc: Reset implementation for new GuC interface
>    drm/i915: Reset GPU immediately if submission is disabled
>    drm/i915/guc: Add disable interrupts to guc sanitize
>    drm/i915/guc: Suspend/resume implementation for new interface
>    drm/i915/guc: Handle context reset notification
>    drm/i915/guc: Handle engine reset failure notification
>    drm/i915/guc: Enable the timer expired interrupt for GuC
>    drm/i915/guc: Capture error state on context reset
>    drm/i915/guc: Don't call ring_is_idle in GuC submission
>    drm/i915/guc: Implement banned contexts for GuC submission
>    drm/i915/guc: Allow flexible number of context ids
>    drm/i915/guc: Connect the number of guc_ids to debugfs
>    drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted
>    drm/i915/guc: Don't allow requests not ready to consume all guc_ids
>    drm/i915/guc: Introduce guc_submit_engine object
>    drm/i915/guc: Implement GuC priority management
>    drm/i915/guc: Support request cancellation
>    drm/i915/guc: Check return of __xa_store when registering a context
>    drm/i915/guc: Non-static lrc descriptor registration buffer
>    drm/i915/guc: Take GT PM ref when deregistering context
>    drm/i915: Add GT PM delayed worker
>    drm/i915/guc: Take engine PM when a context is pinned with GuC
>      submission
>    drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
>    drm/i915/guc: Selftest for GuC flow control
>    drm/i915/guc: Update GuC documentation
> 
> Michal Wajdeczko (21):
>    drm/i915/guc: Keep strict GuC ABI definitions
>    drm/i915/guc: Stop using fence/status from CTB descriptor
>    drm/i915: Promote ptrdiff() to i915_utils.h
>    drm/i915/guc: Only rely on own CTB size
>    drm/i915/guc: Don't repeat CTB layout calculations
>    drm/i915/guc: Replace CTB array with explicit members
>    drm/i915/guc: Update sizes of CTB buffers
>    drm/i915/guc: Relax CTB response timeout
>    drm/i915/guc: Start protecting access to CTB descriptors
>    drm/i915/guc: Stop using mutex while sending CTB messages
>    drm/i915/guc: Don't receive all G2H messages in irq handler
>    drm/i915/guc: Always copy CT message to new allocation
>    drm/i915/guc: Introduce unified HXG messages
>    drm/i915/guc: Update MMIO based communication
>    drm/i915/guc: Update CTB response status
>    drm/i915/guc: Add flag for mark broken CTB
>    drm/i915/guc: New definition of the CTB descriptor
>    drm/i915/guc: New definition of the CTB registration action
>    drm/i915/guc: New CTB based communication
>    drm/i915/guc: Kill guc_clients.ct_pool
>    drm/i915/guc: Early initialization of GuC send registers
> 
> Rodrigo Vivi (1):
>    drm/i915/guc: Remove sample_forcewake h2g action
> 
>   drivers/gpu/drm/i915/Makefile                 |    2 +
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   39 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c      |    4 +-
>   drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
>   drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |   45 +-
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   76 +-
>   drivers/gpu/drm/i915/gt/intel_engine.h        |   96 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  320 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   75 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   14 +-
>   drivers/gpu/drm/i915/gt/intel_engine_pm.h     |    5 +
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   71 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |    6 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  693 +--
>   .../drm/i915/gt/intel_execlists_submission.h  |   14 -
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |    5 +
>   drivers/gpu/drm/i915/gt/intel_gt.c            |   23 +
>   drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
>   drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  100 +-
>   drivers/gpu/drm/i915/gt/intel_gt_irq.h        |   23 +
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   14 +-
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h         |   13 +
>   .../drm/i915/gt/intel_gt_pm_delayed_work.c    |   35 +
>   .../drm/i915/gt/intel_gt_pm_delayed_work.h    |   24 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   23 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      |   10 +
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
>   drivers/gpu/drm/i915/gt/intel_reset.c         |   58 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   73 +-
>   drivers/gpu/drm/i915/gt/intel_rps.c           |    6 +-
>   drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
>   .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         |   58 +-
>   drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |   58 +-
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |    6 +-
>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |    6 +-
>   drivers/gpu/drm/i915/gt/selftest_reset.c      |    2 +-
>   .../drm/i915/gt/selftest_ring_submission.c    |    2 +-
>   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  177 +
>   .../gt/uc/abi/guc_communication_ctb_abi.h     |  192 +
>   .../gt/uc/abi/guc_communication_mmio_abi.h    |   35 +
>   .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   13 +
>   .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h |  247 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  194 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  131 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  484 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 1088 +++--
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   49 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   56 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  377 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4037 +++++++++++++++--
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   20 +-
>   .../i915/gt/uc/intel_guc_submission_types.h   |   55 +
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  116 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   25 +-
>   .../i915/gt/uc/selftest_guc_flow_control.c    |  589 +++
>   drivers/gpu/drm/i915/i915_active.c            |    3 +
>   drivers/gpu/drm/i915/i915_debugfs.c           |    8 +-
>   drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
>   drivers/gpu/drm/i915/i915_drv.h               |    2 +-
>   drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   28 +-
>   drivers/gpu/drm/i915/i915_irq.c               |   10 +-
>   drivers/gpu/drm/i915/i915_params.h            |    2 +-
>   drivers/gpu/drm/i915/i915_perf.c              |   16 +-
>   drivers/gpu/drm/i915/i915_reg.h               |    2 +
>   drivers/gpu/drm/i915/i915_request.c           |  218 +-
>   drivers/gpu/drm/i915/i915_request.h           |   37 +-
>   drivers/gpu/drm/i915/i915_scheduler.c         |  188 +-
>   drivers/gpu/drm/i915/i915_scheduler.h         |   74 +-
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |   74 +
>   drivers/gpu/drm/i915/i915_trace.h             |  219 +-
>   drivers/gpu/drm/i915/i915_utils.h             |    5 +
>   drivers/gpu/drm/i915/i915_vma.h               |    5 -
>   drivers/gpu/drm/i915/intel_wakeref.c          |    5 +
>   drivers/gpu/drm/i915/intel_wakeref.h          |    1 +
>   .../drm/i915/selftests/i915_live_selftests.h  |    1 +
>   .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
>   .../i915/selftests/intel_scheduler_helpers.c  |  101 +
>   .../i915/selftests/intel_scheduler_helpers.h  |   37 +
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
>   include/uapi/drm/i915_drm.h                   |    9 +
>   93 files changed, 8954 insertions(+), 2222 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.c
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_delayed_work.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
>   create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
>
Jason Ekstrand May 9, 2021, 11:11 p.m. UTC | #2
On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:

> Hi,
>
> On 06/05/2021 22:13, Matthew Brost wrote:
>> Basic GuC submission support. This is the first bullet point in the
>> upstreaming plan covered in the following RFC [1].
>>
>> At a very high level the GuC is a piece of firmware which sits between
>> the i915 and the GPU. It offloads some of the scheduling of contexts
>> from the i915 and programs the GPU to submit contexts. The i915
>> communicates with the GuC and the GuC communicates with the GPU.
>
> May I ask what will GuC command submission do that execlist won't/can't
> do? And what would be the impact on users? Even forgetting the troubled
> history of GuC (instability, performance regression, poor level of user
> support, 6+ years of trying to upstream it...), adding this much code
> and doubling the amount of validation needed should come with a
> rationale making it feel worth it... and I am not seeing here. Would you
> mind providing the rationale behind this work?
>
>>
>> GuC submission will be disabled by default on all current upstream
>> platforms behind a module parameter - enable_guc. A value of 3 will
>> enable submission and HuC loading via the GuC. GuC submission should
>> work on all gen11+ platforms assuming the GuC firmware is present.
>
> What is the plan here when it comes to keeping support for execlist? I
> am afraid that landing GuC support in Linux is the first step towards
> killing the execlist, which would force users to use proprietary
> firmwares that even most Intel engineers have little influence over.
> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
> which states "Disable semaphores when using GuC scheduling as semaphores
> are broken in the current GuC firmware." is anything to go by, it means
> that even Intel developers seem to prefer working around the GuC
> firmware, rather than fixing it.

Yes, landing GuC support may be the first step in removing execlist 
support. The inevitable reality is that GPU scheduling is coming and likely 
to be there only path in the not-too-distant future. (See also the ongoing 
thread with AMD about fences.) I'm not going to pass judgement on whether 
or not this is a good thing.  I'm just reading the winds and, in my view, 
this is where things are headed for good or ill.

In answer to the question above, the answer to "what do we gain from GuC?" 
may soon be, "you get to use your GPU."  We're not there yet and, again, 
I'm not necessarily advocating for it, but that is likely where things are 
headed.

A firmware-based submission model isn't a bad design IMO and, aside from 
the firmware freedom issues, I think there are actual advantages to the 
model. Immediately, it'll unlock a few features like parallel submission 
(more on that in a bit) and long-running compute because they're 
implemented in GuC and the work to implement them properly in the execlist 
scheduler is highly non-trivial. Longer term, it may (no guarantees) unlock 
some performance by getting the kernel out of the way.


> In the same vein, I have another concern related to the impact of GuC on
> Linux's stable releases. Let's say that in 3 years, a new application
> triggers a bug in command submission inside the firmware. Given that the
> Linux community cannot patch the GuC firmware, how likely is it that
> Intel would release a new GuC version? That would not be necessarily
> such a big problem if newer versions of the GuC could easily be
> backported to this potentially-decade-old Linux version, but given that
> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
> at major version 60 *already*? :o), I would say that it is
> highly-unlikely that it would not require potentially-extensive changes
> to i915 to make it work, making the fix almost impossible to land in the
> stable tree... Do you have a plan to mitigate this problem?
>
> Patches like "drm/i915/guc: Disable bonding extension with GuC
> submission" also make me twitch, as this means the two command
> submission paths will not be functionally equivalent, and enabling GuC
> could thus introduce a user-visible regression (one app used to work,
> then stopped working). Could you add in the commit's message a proof
> that this would not end up being a user regression (in which case, why
> have this codepath to begin with?).

I'd like to address this one specifically as it's become something of a 
speciality of mine the past few weeks. The current bonded submission model 
is bad. It provides a plethora of ways for a client to back itself into a 
corner and doesn't actually provide the guarantees the media driver needs 
for its real-time high-resolution decode. It's bad enough we're seriously 
considering ripping it out, backwards compatibility or not. The good news 
is that very little that your average desktop user does depends on it: 
basically just real-time >4K video decode.

The new parallel submit API is much better and should be the path forward. 
(We should have landed parallel submit the first time around.) It isn't 
full of corners and does let us provides actual parallel execution 
guarantees. It also gives the scheduler the information it needs to 
reliably provide those guarantees.

If we need to support the parallel submit API with the execlist back-end, 
that's totally possible. The choice to only implement the parallel submit 
API with GuC is a pragmatic one. We're trying to get upstream back on it's 
feet and get all the various up-and-coming bits of hardware enabled. 
Enabling the new API in the execlist back-end makes that pipeline longer.


> Finally, could you explain why IGT tests need to be modified to work the
> GuC [1], and how much of the code in this series is covered by
> existing/upcoming tests? I would expect a very solid set of tests to
> minimize the maintenance burden, and enable users to reproduce potential
> issues found in this new codepath (too many users run with enable_guc=3,
> as can be seen on Google[2]).

The IGT changes, as I understand them, are entirely around switching to the 
new parallel submit API. There shouldn't be a major effect to most users.

--Jason
Martin Peres May 10, 2021, 1:55 p.m. UTC | #3
On 10/05/2021 02:11, Jason Ekstrand wrote:
> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> 
>> Hi,
>>
>> On 06/05/2021 22:13, Matthew Brost wrote:
>>> Basic GuC submission support. This is the first bullet point in the
>>> upstreaming plan covered in the following RFC [1].
>>>
>>> At a very high level the GuC is a piece of firmware which sits between
>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>> from the i915 and programs the GPU to submit contexts. The i915
>>> communicates with the GuC and the GuC communicates with the GPU.
>>
>> May I ask what will GuC command submission do that execlist won't/can't
>> do? And what would be the impact on users? Even forgetting the troubled
>> history of GuC (instability, performance regression, poor level of user
>> support, 6+ years of trying to upstream it...), adding this much code
>> and doubling the amount of validation needed should come with a
>> rationale making it feel worth it... and I am not seeing here. Would you
>> mind providing the rationale behind this work?
>>
>>>
>>> GuC submission will be disabled by default on all current upstream
>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>> enable submission and HuC loading via the GuC. GuC submission should
>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>
>> What is the plan here when it comes to keeping support for execlist? I
>> am afraid that landing GuC support in Linux is the first step towards
>> killing the execlist, which would force users to use proprietary
>> firmwares that even most Intel engineers have little influence over.
>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>> which states "Disable semaphores when using GuC scheduling as semaphores
>> are broken in the current GuC firmware." is anything to go by, it means
>> that even Intel developers seem to prefer working around the GuC
>> firmware, rather than fixing it.
> 
> Yes, landing GuC support may be the first step in removing execlist 
> support. The inevitable reality is that GPU scheduling is coming and 
> likely to be there only path in the not-too-distant future. (See also 
> the ongoing thread with AMD about fences.) I'm not going to pass 
> judgement on whether or not this is a good thing.  I'm just reading the 
> winds and, in my view, this is where things are headed for good or ill.
> 
> In answer to the question above, the answer to "what do we gain from 
> GuC?" may soon be, "you get to use your GPU."  We're not there yet and, 
> again, I'm not necessarily advocating for it, but that is likely where 
> things are headed.

This will be a sad day, especially since it seems fundamentally opposed 
with any long-term support, on top of taking away user freedom to 
fix/tweak their system when Intel won't.

> A firmware-based submission model isn't a bad design IMO and, aside from 
> the firmware freedom issues, I think there are actual advantages to the 
> model. Immediately, it'll unlock a few features like parallel submission 
> (more on that in a bit) and long-running compute because they're 
> implemented in GuC and the work to implement them properly in the 
> execlist scheduler is highly non-trivial. Longer term, it may (no 
> guarantees) unlock some performance by getting the kernel out of the way.

Oh, I definitely agree with firmware-based submission model not being a 
bad design. I was even cheering for it in 2015. Experience with it made 
me regret that deeply since :s

But with the DRM scheduler being responsible for most things, I fail to 
see what we could offload in the GuC except context switching (like 
every other manufacturer). The problem is, the GuC does way more than 
just switching registers in bulk, and if the number of revisions of the 
GuC is anything to go by, it is way too complex for me to feel 
comfortable with it.

> 
>> In the same vein, I have another concern related to the impact of GuC on
>> Linux's stable releases. Let's say that in 3 years, a new application
>> triggers a bug in command submission inside the firmware. Given that the
>> Linux community cannot patch the GuC firmware, how likely is it that
>> Intel would release a new GuC version? That would not be necessarily
>> such a big problem if newer versions of the GuC could easily be
>> backported to this potentially-decade-old Linux version, but given that
>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>> at major version 60 *already*? :o), I would say that it is
>> highly-unlikely that it would not require potentially-extensive changes
>> to i915 to make it work, making the fix almost impossible to land in the
>> stable tree... Do you have a plan to mitigate this problem?
>>
>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>> submission" also make me twitch, as this means the two command
>> submission paths will not be functionally equivalent, and enabling GuC
>> could thus introduce a user-visible regression (one app used to work,
>> then stopped working). Could you add in the commit's message a proof
>> that this would not end up being a user regression (in which case, why
>> have this codepath to begin with?).
> 
> I'd like to address this one specifically as it's become something of a 
> speciality of mine the past few weeks. The current bonded submission 
> model is bad. It provides a plethora of ways for a client to back itself 
> into a corner and doesn't actually provide the guarantees the media 
> driver needs for its real-time high-resolution decode. It's bad enough 
> we're seriously considering ripping it out, backwards compatibility or 
> not. The good news is that very little that your average desktop user 
> does depends on it: basically just real-time >4K video decode.
> 
> The new parallel submit API is much better and should be the path 
> forward. (We should have landed parallel submit the first time around.) 
> It isn't full of corners and does let us provides actual parallel 
> execution guarantees. It also gives the scheduler the information it 
> needs to reliably provide those guarantees. >
> If we need to support the parallel submit API with the execlist 
> back-end, that's totally possible. The choice to only implement the 
> parallel submit API with GuC is a pragmatic one. We're trying to get 
> upstream back on it's feet and get all the various up-and-coming bits of 
> hardware enabled. Enabling the new API in the execlist back-end makes 
> that pipeline longer.

I feel your pain, and wish you all the best to get GEM less complex
and more manageable.

So, if I understood correctly, the plan is just to regress 4K+ video 
decoding for people who do not enable GuC scheduling, or did not also 
update to a recent-enough media driver that would support this new 
interface? If it is indeed only for over 4K videos, then whatever. If it 
is 4K, it starts being a little bad, assuming graceful fallback to 
CPU-based decoding. What's the test plan for this patch then? The patch 
in its current form is definitely not making me confident.

> 
>> Finally, could you explain why IGT tests need to be modified to work the
>> GuC [1], and how much of the code in this series is covered by
>> existing/upcoming tests? I would expect a very solid set of tests to
>> minimize the maintenance burden, and enable users to reproduce potential
>> issues found in this new codepath (too many users run with enable_guc=3,
>> as can be seen on Google[2]).
> 
> The IGT changes, as I understand them, are entirely around switching to 
> the new parallel submit API. There shouldn't be a major effect to most 
> users.

Right, this part I followed, but failed to connect it to the GuC... 
because I couldn't see why it would be needed (execlist requiring a lot 
more work).

I sincerely wish for the GuC to stay away from upstream because of the 
above concerns (which are yet to be addressed), but if Intel were to 
push forward with the plan to drop execlist, I can foresee a world of 
trouble for users... That is of course unless the GuC were to be open 
sourced, with people outside of Intel able to sign their own builds or 
run unsigned. Failing that, let's hope the last 6 years were just a bad 
start, and the rapid climb in major version of the GuC will magically 
stop! I hope execlists will remain at feature parity with the GuC when 
possible... but deplore the increase in validation needs which will only 
hurt users in the end.

Thanks for your honest answer,
Martin

> 
> --Jason
Jason Ekstrand May 10, 2021, 4:25 p.m. UTC | #4
On May 10, 2021 08:55:55 Martin Peres <martin.peres@free.fr> wrote:

> On 10/05/2021 02:11, Jason Ekstrand wrote:
>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>
>>> Hi,
>>>
>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>> Basic GuC submission support. This is the first bullet point in the
>>>> upstreaming plan covered in the following RFC [1].
>>>>
>>>> At a very high level the GuC is a piece of firmware which sits between
>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>
>>> May I ask what will GuC command submission do that execlist won't/can't
>>> do? And what would be the impact on users? Even forgetting the troubled
>>> history of GuC (instability, performance regression, poor level of user
>>> support, 6+ years of trying to upstream it...), adding this much code
>>> and doubling the amount of validation needed should come with a
>>> rationale making it feel worth it... and I am not seeing here. Would you
>>> mind providing the rationale behind this work?
>>>
>>>>
>>>> GuC submission will be disabled by default on all current upstream
>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>
>>> What is the plan here when it comes to keeping support for execlist? I
>>> am afraid that landing GuC support in Linux is the first step towards
>>> killing the execlist, which would force users to use proprietary
>>> firmwares that even most Intel engineers have little influence over.
>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>> are broken in the current GuC firmware." is anything to go by, it means
>>> that even Intel developers seem to prefer working around the GuC
>>> firmware, rather than fixing it.
>>
>> Yes, landing GuC support may be the first step in removing execlist
>> support. The inevitable reality is that GPU scheduling is coming and
>> likely to be there only path in the not-too-distant future. (See also
>> the ongoing thread with AMD about fences.) I'm not going to pass
>> judgement on whether or not this is a good thing.  I'm just reading the
>> winds and, in my view, this is where things are headed for good or ill.
>>
>> In answer to the question above, the answer to "what do we gain from
>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> again, I'm not necessarily advocating for it, but that is likely where
>> things are headed.
>
> This will be a sad day, especially since it seems fundamentally opposed
> with any long-term support, on top of taking away user freedom to
> fix/tweak their system when Intel won't.
>
>> A firmware-based submission model isn't a bad design IMO and, aside from
>> the firmware freedom issues, I think there are actual advantages to the
>> model. Immediately, it'll unlock a few features like parallel submission
>> (more on that in a bit) and long-running compute because they're
>> implemented in GuC and the work to implement them properly in the
>> execlist scheduler is highly non-trivial. Longer term, it may (no
>> guarantees) unlock some performance by getting the kernel out of the way.
>
> Oh, I definitely agree with firmware-based submission model not being a
> bad design. I was even cheering for it in 2015. Experience with it made
> me regret that deeply since :s
>
> But with the DRM scheduler being responsible for most things, I fail to
> see what we could offload in the GuC except context switching (like
> every other manufacturer). The problem is, the GuC does way more than
> just switching registers in bulk, and if the number of revisions of the
> GuC is anything to go by, it is way too complex for me to feel
> comfortable with it.

It's more than just bulk register writes. When it comes to load-balancing 
multiple GPU users, firmware can theoretically preempt and switch faster 
leading to more efficient time-slicing. All we really need the DRM 
scheduler for is handling implicit dma_fence dependencies between different 
applications.


>
>>> In the same vein, I have another concern related to the impact of GuC on
>>> Linux's stable releases. Let's say that in 3 years, a new application
>>> triggers a bug in command submission inside the firmware. Given that the
>>> Linux community cannot patch the GuC firmware, how likely is it that
>>> Intel would release a new GuC version? That would not be necessarily
>>> such a big problem if newer versions of the GuC could easily be
>>> backported to this potentially-decade-old Linux version, but given that
>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>> at major version 60 *already*? :o), I would say that it is
>>> highly-unlikely that it would not require potentially-extensive changes
>>> to i915 to make it work, making the fix almost impossible to land in the
>>> stable tree... Do you have a plan to mitigate this problem?
>>>
>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>> submission" also make me twitch, as this means the two command
>>> submission paths will not be functionally equivalent, and enabling GuC
>>> could thus introduce a user-visible regression (one app used to work,
>>> then stopped working). Could you add in the commit's message a proof
>>> that this would not end up being a user regression (in which case, why
>>> have this codepath to begin with?).
>>
>> I'd like to address this one specifically as it's become something of a
>> speciality of mine the past few weeks. The current bonded submission
>> model is bad. It provides a plethora of ways for a client to back itself
>> into a corner and doesn't actually provide the guarantees the media
>> driver needs for its real-time high-resolution decode. It's bad enough
>> we're seriously considering ripping it out, backwards compatibility or
>> not. The good news is that very little that your average desktop user
>> does depends on it: basically just real-time >4K video decode.
>>
>> The new parallel submit API is much better and should be the path
>> forward. (We should have landed parallel submit the first time around.)
>> It isn't full of corners and does let us provides actual parallel
>> execution guarantees. It also gives the scheduler the information it
>> needs to reliably provide those guarantees. >
>> If we need to support the parallel submit API with the execlist
>> back-end, that's totally possible. The choice to only implement the
>> parallel submit API with GuC is a pragmatic one. We're trying to get
>> upstream back on it's feet and get all the various up-and-coming bits of
>> hardware enabled. Enabling the new API in the execlist back-end makes
>> that pipeline longer.
>
> I feel your pain, and wish you all the best to get GEM less complex
> and more manageable.
>
> So, if I understood correctly, the plan is just to regress 4K+ video
> decoding for people who do not enable GuC scheduling, or did not also
> update to a recent-enough media driver that would support this new
> interface? If it is indeed only for over 4K videos, then whatever. If it
> is 4K, it starts being a little bad, assuming graceful fallback to
> CPU-based decoding. What's the test plan for this patch then? The patch
> in its current form is definitely not making me confident.

My understanding is that it's only >4k that's affected; we've got enough 
bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off is 
(it may be a little higher than 4k) but real-time 4k should be fine and 
real-time 8k requires parallel submit. So we're really not cutting off many 
use-cases. Also, as I said above, the new API can be implemented with the 
execlist scheduler if needed. We've just pragmatically deprioritized it.

--Jason


>
>>> Finally, could you explain why IGT tests need to be modified to work the
>>> GuC [1], and how much of the code in this series is covered by
>>> existing/upcoming tests? I would expect a very solid set of tests to
>>> minimize the maintenance burden, and enable users to reproduce potential
>>> issues found in this new codepath (too many users run with enable_guc=3,
>>> as can be seen on Google[2]).
>>
>> The IGT changes, as I understand them, are entirely around switching to
>> the new parallel submit API. There shouldn't be a major effect to most
>> users.
>
> Right, this part I followed, but failed to connect it to the GuC...
> because I couldn't see why it would be needed (execlist requiring a lot
> more work).
>
> I sincerely wish for the GuC to stay away from upstream because of the
> above concerns (which are yet to be addressed), but if Intel were to
> push forward with the plan to drop execlist, I can foresee a world of
> trouble for users... That is of course unless the GuC were to be open
> sourced, with people outside of Intel able to sign their own builds or
> run unsigned. Failing that, let's hope the last 6 years were just a bad
> start, and the rapid climb in major version of the GuC will magically
> stop! I hope execlists will remain at feature parity with the GuC when
> possible... but deplore the increase in validation needs which will only
> hurt users in the end.
>
> Thanks for your honest answer,
> Martin
>
>>
>> --Jason
Daniel Vetter May 10, 2021, 4:33 p.m. UTC | #5
On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>
> On 10/05/2021 02:11, Jason Ekstrand wrote:
> > On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> >
> >> Hi,
> >>
> >> On 06/05/2021 22:13, Matthew Brost wrote:
> >>> Basic GuC submission support. This is the first bullet point in the
> >>> upstreaming plan covered in the following RFC [1].
> >>>
> >>> At a very high level the GuC is a piece of firmware which sits between
> >>> the i915 and the GPU. It offloads some of the scheduling of contexts
> >>> from the i915 and programs the GPU to submit contexts. The i915
> >>> communicates with the GuC and the GuC communicates with the GPU.
> >>
> >> May I ask what will GuC command submission do that execlist won't/can't
> >> do? And what would be the impact on users? Even forgetting the troubled
> >> history of GuC (instability, performance regression, poor level of user
> >> support, 6+ years of trying to upstream it...), adding this much code
> >> and doubling the amount of validation needed should come with a
> >> rationale making it feel worth it... and I am not seeing here. Would you
> >> mind providing the rationale behind this work?
> >>
> >>>
> >>> GuC submission will be disabled by default on all current upstream
> >>> platforms behind a module parameter - enable_guc. A value of 3 will
> >>> enable submission and HuC loading via the GuC. GuC submission should
> >>> work on all gen11+ platforms assuming the GuC firmware is present.
> >>
> >> What is the plan here when it comes to keeping support for execlist? I
> >> am afraid that landing GuC support in Linux is the first step towards
> >> killing the execlist, which would force users to use proprietary
> >> firmwares that even most Intel engineers have little influence over.
> >> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
> >> which states "Disable semaphores when using GuC scheduling as semaphores
> >> are broken in the current GuC firmware." is anything to go by, it means
> >> that even Intel developers seem to prefer working around the GuC
> >> firmware, rather than fixing it.
> >
> > Yes, landing GuC support may be the first step in removing execlist
> > support. The inevitable reality is that GPU scheduling is coming and
> > likely to be there only path in the not-too-distant future. (See also
> > the ongoing thread with AMD about fences.) I'm not going to pass
> > judgement on whether or not this is a good thing.  I'm just reading the
> > winds and, in my view, this is where things are headed for good or ill.
> >
> > In answer to the question above, the answer to "what do we gain from
> > GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
> > again, I'm not necessarily advocating for it, but that is likely where
> > things are headed.
>
> This will be a sad day, especially since it seems fundamentally opposed
> with any long-term support, on top of taking away user freedom to
> fix/tweak their system when Intel won't.
>
> > A firmware-based submission model isn't a bad design IMO and, aside from
> > the firmware freedom issues, I think there are actual advantages to the
> > model. Immediately, it'll unlock a few features like parallel submission
> > (more on that in a bit) and long-running compute because they're
> > implemented in GuC and the work to implement them properly in the
> > execlist scheduler is highly non-trivial. Longer term, it may (no
> > guarantees) unlock some performance by getting the kernel out of the way.
>
> Oh, I definitely agree with firmware-based submission model not being a
> bad design. I was even cheering for it in 2015. Experience with it made
> me regret that deeply since :s
>
> But with the DRM scheduler being responsible for most things, I fail to
> see what we could offload in the GuC except context switching (like
> every other manufacturer). The problem is, the GuC does way more than
> just switching registers in bulk, and if the number of revisions of the
> GuC is anything to go by, it is way too complex for me to feel
> comfortable with it.

We need to flesh out that part of the plan more, but we're not going
to use drm scheduler for everything. It's only to handle the dma-fence
legacy side of things, which means:
- timeout handling for batches that take too long
- dma_fence dependency sorting/handling
- boosting of context from display flips (currently missing, needs to
be ported from drm/i915)

The actual round-robin/preempt/priority handling is still left to the
backend, in this case here the fw. So there's large chunks of
code/functionality where drm/scheduler wont be involved in, and like
Jason says: The hw direction winds definitely blow in the direction
that this is all handled in hw.

> >> In the same vein, I have another concern related to the impact of GuC on
> >> Linux's stable releases. Let's say that in 3 years, a new application
> >> triggers a bug in command submission inside the firmware. Given that the
> >> Linux community cannot patch the GuC firmware, how likely is it that
> >> Intel would release a new GuC version? That would not be necessarily
> >> such a big problem if newer versions of the GuC could easily be
> >> backported to this potentially-decade-old Linux version, but given that
> >> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
> >> at major version 60 *already*? :o), I would say that it is
> >> highly-unlikely that it would not require potentially-extensive changes
> >> to i915 to make it work, making the fix almost impossible to land in the
> >> stable tree... Do you have a plan to mitigate this problem?
> >>
> >> Patches like "drm/i915/guc: Disable bonding extension with GuC
> >> submission" also make me twitch, as this means the two command
> >> submission paths will not be functionally equivalent, and enabling GuC
> >> could thus introduce a user-visible regression (one app used to work,
> >> then stopped working). Could you add in the commit's message a proof
> >> that this would not end up being a user regression (in which case, why
> >> have this codepath to begin with?).
> >
> > I'd like to address this one specifically as it's become something of a
> > speciality of mine the past few weeks. The current bonded submission
> > model is bad. It provides a plethora of ways for a client to back itself
> > into a corner and doesn't actually provide the guarantees the media
> > driver needs for its real-time high-resolution decode. It's bad enough
> > we're seriously considering ripping it out, backwards compatibility or
> > not. The good news is that very little that your average desktop user
> > does depends on it: basically just real-time >4K video decode.
> >
> > The new parallel submit API is much better and should be the path
> > forward. (We should have landed parallel submit the first time around.)
> > It isn't full of corners and does let us provides actual parallel
> > execution guarantees. It also gives the scheduler the information it
> > needs to reliably provide those guarantees. >
> > If we need to support the parallel submit API with the execlist
> > back-end, that's totally possible. The choice to only implement the
> > parallel submit API with GuC is a pragmatic one. We're trying to get
> > upstream back on it's feet and get all the various up-and-coming bits of
> > hardware enabled. Enabling the new API in the execlist back-end makes
> > that pipeline longer.
>
> I feel your pain, and wish you all the best to get GEM less complex
> and more manageable.
>
> So, if I understood correctly, the plan is just to regress 4K+ video
> decoding for people who do not enable GuC scheduling, or did not also
> update to a recent-enough media driver that would support this new
> interface? If it is indeed only for over 4K videos, then whatever. If it
> is 4K, it starts being a little bad, assuming graceful fallback to
> CPU-based decoding. What's the test plan for this patch then? The patch
> in its current form is definitely not making me confident.

Only if they don't scream loudly enough. If someone screams loud
enough we'll bite the bullet and enable the new interface on execlist
backend.

Cheers, Daniel

> >> Finally, could you explain why IGT tests need to be modified to work the
> >> GuC [1], and how much of the code in this series is covered by
> >> existing/upcoming tests? I would expect a very solid set of tests to
> >> minimize the maintenance burden, and enable users to reproduce potential
> >> issues found in this new codepath (too many users run with enable_guc=3,
> >> as can be seen on Google[2]).
> >
> > The IGT changes, as I understand them, are entirely around switching to
> > the new parallel submit API. There shouldn't be a major effect to most
> > users.
>
> Right, this part I followed, but failed to connect it to the GuC...
> because I couldn't see why it would be needed (execlist requiring a lot
> more work).
>
> I sincerely wish for the GuC to stay away from upstream because of the
> above concerns (which are yet to be addressed), but if Intel were to
> push forward with the plan to drop execlist, I can foresee a world of
> trouble for users... That is of course unless the GuC were to be open
> sourced, with people outside of Intel able to sign their own builds or
> run unsigned. Failing that, let's hope the last 6 years were just a bad
> start, and the rapid climb in major version of the GuC will magically
> stop! I hope execlists will remain at feature parity with the GuC when
> possible... but deplore the increase in validation needs which will only
> hurt users in the end.
>
> Thanks for your honest answer,
> Martin
>
> >
> > --Jason
Francisco Jerez May 10, 2021, 6:30 p.m. UTC | #6
Daniel Vetter <daniel@ffwll.ch> writes:

> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>>
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>> > On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>> >
>> >> Hi,
>> >>
>> >> On 06/05/2021 22:13, Matthew Brost wrote:
>> >>> Basic GuC submission support. This is the first bullet point in the
>> >>> upstreaming plan covered in the following RFC [1].
>> >>>
>> >>> At a very high level the GuC is a piece of firmware which sits between
>> >>> the i915 and the GPU. It offloads some of the scheduling of contexts
>> >>> from the i915 and programs the GPU to submit contexts. The i915
>> >>> communicates with the GuC and the GuC communicates with the GPU.
>> >>
>> >> May I ask what will GuC command submission do that execlist won't/can't
>> >> do? And what would be the impact on users? Even forgetting the troubled
>> >> history of GuC (instability, performance regression, poor level of user
>> >> support, 6+ years of trying to upstream it...), adding this much code
>> >> and doubling the amount of validation needed should come with a
>> >> rationale making it feel worth it... and I am not seeing here. Would you
>> >> mind providing the rationale behind this work?
>> >>
>> >>>
>> >>> GuC submission will be disabled by default on all current upstream
>> >>> platforms behind a module parameter - enable_guc. A value of 3 will
>> >>> enable submission and HuC loading via the GuC. GuC submission should
>> >>> work on all gen11+ platforms assuming the GuC firmware is present.
>> >>
>> >> What is the plan here when it comes to keeping support for execlist? I
>> >> am afraid that landing GuC support in Linux is the first step towards
>> >> killing the execlist, which would force users to use proprietary
>> >> firmwares that even most Intel engineers have little influence over.
>> >> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>> >> which states "Disable semaphores when using GuC scheduling as semaphores
>> >> are broken in the current GuC firmware." is anything to go by, it means
>> >> that even Intel developers seem to prefer working around the GuC
>> >> firmware, rather than fixing it.
>> >
>> > Yes, landing GuC support may be the first step in removing execlist
>> > support. The inevitable reality is that GPU scheduling is coming and
>> > likely to be there only path in the not-too-distant future. (See also
>> > the ongoing thread with AMD about fences.) I'm not going to pass
>> > judgement on whether or not this is a good thing.  I'm just reading the
>> > winds and, in my view, this is where things are headed for good or ill.
>> >
>> > In answer to the question above, the answer to "what do we gain from
>> > GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> > again, I'm not necessarily advocating for it, but that is likely where
>> > things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>> > A firmware-based submission model isn't a bad design IMO and, aside from
>> > the firmware freedom issues, I think there are actual advantages to the
>> > model. Immediately, it'll unlock a few features like parallel submission
>> > (more on that in a bit) and long-running compute because they're
>> > implemented in GuC and the work to implement them properly in the
>> > execlist scheduler is highly non-trivial. Longer term, it may (no
>> > guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
>
> We need to flesh out that part of the plan more, but we're not going
> to use drm scheduler for everything. It's only to handle the dma-fence
> legacy side of things, which means:
> - timeout handling for batches that take too long
> - dma_fence dependency sorting/handling
> - boosting of context from display flips (currently missing, needs to
> be ported from drm/i915)
>
> The actual round-robin/preempt/priority handling is still left to the
> backend, in this case here the fw. So there's large chunks of
> code/functionality where drm/scheduler wont be involved in, and like
> Jason says: The hw direction winds definitely blow in the direction
> that this is all handled in hw.
>

I agree with Martin on this.  Given that using GuC currently involves
making your open-source graphics stack rely on a closed-source
cryptographically-protected blob in order to submit commands to the GPU,
and given that it is /still/ possible to use the GPU without it, I'd
expect some strong material justification for making the switch (like,
it improves performance of test-case X and Y by Z%, or, we're truly
sorry but we cannot program your GPU anymore with a purely open-source
software stack).  Any argument based on the apparent direction of the
wind doesn't sound like a material engineering reason to me, and runs
the risk of being self-fulfilling if it leads us to do the worse thing
for our users just because we have the vague feeling that it is the
general trend, even though we may have had the means to obtain a better
compromise for them.

>> >> In the same vein, I have another concern related to the impact of GuC on
>> >> Linux's stable releases. Let's say that in 3 years, a new application
>> >> triggers a bug in command submission inside the firmware. Given that the
>> >> Linux community cannot patch the GuC firmware, how likely is it that
>> >> Intel would release a new GuC version? That would not be necessarily
>> >> such a big problem if newer versions of the GuC could easily be
>> >> backported to this potentially-decade-old Linux version, but given that
>> >> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>> >> at major version 60 *already*? :o), I would say that it is
>> >> highly-unlikely that it would not require potentially-extensive changes
>> >> to i915 to make it work, making the fix almost impossible to land in the
>> >> stable tree... Do you have a plan to mitigate this problem?
>> >>
>> >> Patches like "drm/i915/guc: Disable bonding extension with GuC
>> >> submission" also make me twitch, as this means the two command
>> >> submission paths will not be functionally equivalent, and enabling GuC
>> >> could thus introduce a user-visible regression (one app used to work,
>> >> then stopped working). Could you add in the commit's message a proof
>> >> that this would not end up being a user regression (in which case, why
>> >> have this codepath to begin with?).
>> >
>> > I'd like to address this one specifically as it's become something of a
>> > speciality of mine the past few weeks. The current bonded submission
>> > model is bad. It provides a plethora of ways for a client to back itself
>> > into a corner and doesn't actually provide the guarantees the media
>> > driver needs for its real-time high-resolution decode. It's bad enough
>> > we're seriously considering ripping it out, backwards compatibility or
>> > not. The good news is that very little that your average desktop user
>> > does depends on it: basically just real-time >4K video decode.
>> >
>> > The new parallel submit API is much better and should be the path
>> > forward. (We should have landed parallel submit the first time around.)
>> > It isn't full of corners and does let us provides actual parallel
>> > execution guarantees. It also gives the scheduler the information it
>> > needs to reliably provide those guarantees. >
>> > If we need to support the parallel submit API with the execlist
>> > back-end, that's totally possible. The choice to only implement the
>> > parallel submit API with GuC is a pragmatic one. We're trying to get
>> > upstream back on it's feet and get all the various up-and-coming bits of
>> > hardware enabled. Enabling the new API in the execlist back-end makes
>> > that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
>
> Only if they don't scream loudly enough. If someone screams loud
> enough we'll bite the bullet and enable the new interface on execlist
> backend.
>
> Cheers, Daniel
>
>> >> Finally, could you explain why IGT tests need to be modified to work the
>> >> GuC [1], and how much of the code in this series is covered by
>> >> existing/upcoming tests? I would expect a very solid set of tests to
>> >> minimize the maintenance burden, and enable users to reproduce potential
>> >> issues found in this new codepath (too many users run with enable_guc=3,
>> >> as can be seen on Google[2]).
>> >
>> > The IGT changes, as I understand them, are entirely around switching to
>> > the new parallel submit API. There shouldn't be a major effect to most
>> > users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>> >
>> > --Jason
>
>
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Dixit, Ashutosh May 11, 2021, 2:58 a.m. UTC | #7
On Sun, 09 May 2021 16:11:43 -0700, Jason Ekstrand wrote:
>
> Yes, landing GuC support may be the first step in removing execlist
> support. The inevitable reality is that GPU scheduling is coming and
> likely to be there only path in the not-too-distant future.  (See also
> the ongoing thread with AMD about fences.) I'm not going to pass
> judgement on whether or not this is a good thing.  I'm just reading the
> winds and, in my view, this is where things are headed for good or ill.
>
> In answer to the question above, the answer to "what do we gain from
> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
> again, I'm not necessarily advocating for it, but that is likely where
> things are headed.
>
> A firmware-based submission model isn't a bad design IMO and, aside from
> the firmware freedom issues, I think there are actual advantages to the
> model. Immediately, it'll unlock a few features like parallel submission
> (more on that in a bit) and long-running compute because they're
> implemented in GuC and the work to implement them properly in the
> execlist scheduler is highly non-trivial.  Longer term, it may (no
> guarantees) unlock some performance by getting the kernel out of the way.

I believe another main reason for GuC is support for HW based
virtualization like SRIOV. The only way to support SRIOV with execlists
would be to statically partition the GPU between VM's, any dynamic
partitioning needs something in HW.
Martin Peres May 11, 2021, 7:47 a.m. UTC | #8
On 11/05/2021 05:58, Dixit, Ashutosh wrote:
> On Sun, 09 May 2021 16:11:43 -0700, Jason Ekstrand wrote:
>>
>> Yes, landing GuC support may be the first step in removing execlist
>> support. The inevitable reality is that GPU scheduling is coming and
>> likely to be there only path in the not-too-distant future.  (See also
>> the ongoing thread with AMD about fences.) I'm not going to pass
>> judgement on whether or not this is a good thing.  I'm just reading the
>> winds and, in my view, this is where things are headed for good or ill.
>>
>> In answer to the question above, the answer to "what do we gain from
>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>> again, I'm not necessarily advocating for it, but that is likely where
>> things are headed.
>>
>> A firmware-based submission model isn't a bad design IMO and, aside from
>> the firmware freedom issues, I think there are actual advantages to the
>> model. Immediately, it'll unlock a few features like parallel submission
>> (more on that in a bit) and long-running compute because they're
>> implemented in GuC and the work to implement them properly in the
>> execlist scheduler is highly non-trivial.  Longer term, it may (no
>> guarantees) unlock some performance by getting the kernel out of the way.
> 
> I believe another main reason for GuC is support for HW based
> virtualization like SRIOV. The only way to support SRIOV with execlists
> would be to statically partition the GPU between VM's, any dynamic
> partitioning needs something in HW.
> 

FW-based command-submission is indeed a win for SRIOV and the current HW 
architecture.

Thanks for chiming in!
Martin
Martin Peres May 11, 2021, 8:01 a.m. UTC | #9
On 10/05/2021 19:25, Jason Ekstrand wrote:
> On May 10, 2021 08:55:55 Martin Peres <martin.peres@free.fr> wrote:
> 
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>> upstreaming plan covered in the following RFC [1].
>>>>>
>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>>
>>>> May I ask what will GuC command submission do that execlist won't/can't
>>>> do? And what would be the impact on users? Even forgetting the troubled
>>>> history of GuC (instability, performance regression, poor level of user
>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>> and doubling the amount of validation needed should come with a
>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>> mind providing the rationale behind this work?
>>>>
>>>>>
>>>>> GuC submission will be disabled by default on all current upstream
>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>
>>>> What is the plan here when it comes to keeping support for execlist? I
>>>> am afraid that landing GuC support in Linux is the first step towards
>>>> killing the execlist, which would force users to use proprietary
>>>> firmwares that even most Intel engineers have little influence over.
>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>> that even Intel developers seem to prefer working around the GuC
>>>> firmware, rather than fixing it.
>>>
>>> Yes, landing GuC support may be the first step in removing execlist
>>> support. The inevitable reality is that GPU scheduling is coming and
>>> likely to be there only path in the not-too-distant future. (See also
>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>> judgement on whether or not this is a good thing.  I'm just reading the
>>> winds and, in my view, this is where things are headed for good or ill.
>>>
>>> In answer to the question above, the answer to "what do we gain from
>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>>> again, I'm not necessarily advocating for it, but that is likely where
>>> things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>>> A firmware-based submission model isn't a bad design IMO and, aside from
>>> the firmware freedom issues, I think there are actual advantages to the
>>> model. Immediately, it'll unlock a few features like parallel submission
>>> (more on that in a bit) and long-running compute because they're
>>> implemented in GuC and the work to implement them properly in the
>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>> guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
> 
> It's more than just bulk register writes. When it comes to 
> load-balancing multiple GPU users, firmware can theoretically preempt 
> and switch faster leading to more efficient time-slicing. All we really 
> need the DRM scheduler for is handling implicit dma_fence dependencies 
> between different applications.

Right, this makes sense. However, if the GuC's interface was so simple, 
I doubt it would be at major version 60 already :s

I don't disagree with FW-based command submission, as it has a lot of 
benefits. I just don't like the route of going with a firmware no-one 
else than Intel can work on, *and* one that doesn't seem to concern 
itself with stable interfaces, and how i915 will have to deal with every 
generation using different interfaces (assuming the firmware was bug-free).

> 
> 
>>>> In the same vein, I have another concern related to the impact of GuC on
>>>> Linux's stable releases. Let's say that in 3 years, a new application
>>>> triggers a bug in command submission inside the firmware. Given that the
>>>> Linux community cannot patch the GuC firmware, how likely is it that
>>>> Intel would release a new GuC version? That would not be necessarily
>>>> such a big problem if newer versions of the GuC could easily be
>>>> backported to this potentially-decade-old Linux version, but given that
>>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>>> at major version 60 *already*? :o), I would say that it is
>>>> highly-unlikely that it would not require potentially-extensive changes
>>>> to i915 to make it work, making the fix almost impossible to land in the
>>>> stable tree... Do you have a plan to mitigate this problem?
>>>>
>>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>>> submission" also make me twitch, as this means the two command
>>>> submission paths will not be functionally equivalent, and enabling GuC
>>>> could thus introduce a user-visible regression (one app used to work,
>>>> then stopped working). Could you add in the commit's message a proof
>>>> that this would not end up being a user regression (in which case, why
>>>> have this codepath to begin with?).
>>>
>>> I'd like to address this one specifically as it's become something of a
>>> speciality of mine the past few weeks. The current bonded submission
>>> model is bad. It provides a plethora of ways for a client to back itself
>>> into a corner and doesn't actually provide the guarantees the media
>>> driver needs for its real-time high-resolution decode. It's bad enough
>>> we're seriously considering ripping it out, backwards compatibility or
>>> not. The good news is that very little that your average desktop user
>>> does depends on it: basically just real-time >4K video decode.
>>>
>>> The new parallel submit API is much better and should be the path
>>> forward. (We should have landed parallel submit the first time around.)
>>> It isn't full of corners and does let us provides actual parallel
>>> execution guarantees. It also gives the scheduler the information it
>>> needs to reliably provide those guarantees. >
>>> If we need to support the parallel submit API with the execlist
>>> back-end, that's totally possible. The choice to only implement the
>>> parallel submit API with GuC is a pragmatic one. We're trying to get
>>> upstream back on it's feet and get all the various up-and-coming bits of
>>> hardware enabled. Enabling the new API in the execlist back-end makes
>>> that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
> 
> My understanding is that it's only >4k that's affected; we've got enough 
> bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off 
> is (it may be a little higher than 4k) but real-time 4k should be fine 
> and real-time 8k requires parallel submit. So we're really not cutting 
> off many use-cases. Also, as I said above, the new API can be 
> implemented with the execlist scheduler if needed. We've just 
> pragmatically deprioritized it.

Sounds like a niche-enough use case to me that I feel no user would 
complain about it.

Martin

> 
> --Jason
> 
> 
>>>> Finally, could you explain why IGT tests need to be modified to work the
>>>> GuC [1], and how much of the code in this series is covered by
>>>> existing/upcoming tests? I would expect a very solid set of tests to
>>>> minimize the maintenance burden, and enable users to reproduce potential
>>>> issues found in this new codepath (too many users run with enable_guc=3,
>>>> as can be seen on Google[2]).
>>>
>>> The IGT changes, as I understand them, are entirely around switching to
>>> the new parallel submit API. There shouldn't be a major effect to most
>>> users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>>>
>>> --Jason
>
Martin Peres May 11, 2021, 8:06 a.m. UTC | #10
On 10/05/2021 19:33, Daniel Vetter wrote:
> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr> wrote:
>>
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>> upstreaming plan covered in the following RFC [1].
>>>>>
>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>>
>>>> May I ask what will GuC command submission do that execlist won't/can't
>>>> do? And what would be the impact on users? Even forgetting the troubled
>>>> history of GuC (instability, performance regression, poor level of user
>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>> and doubling the amount of validation needed should come with a
>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>> mind providing the rationale behind this work?
>>>>
>>>>>
>>>>> GuC submission will be disabled by default on all current upstream
>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>
>>>> What is the plan here when it comes to keeping support for execlist? I
>>>> am afraid that landing GuC support in Linux is the first step towards
>>>> killing the execlist, which would force users to use proprietary
>>>> firmwares that even most Intel engineers have little influence over.
>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>> that even Intel developers seem to prefer working around the GuC
>>>> firmware, rather than fixing it.
>>>
>>> Yes, landing GuC support may be the first step in removing execlist
>>> support. The inevitable reality is that GPU scheduling is coming and
>>> likely to be there only path in the not-too-distant future. (See also
>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>> judgement on whether or not this is a good thing.  I'm just reading the
>>> winds and, in my view, this is where things are headed for good or ill.
>>>
>>> In answer to the question above, the answer to "what do we gain from
>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>>> again, I'm not necessarily advocating for it, but that is likely where
>>> things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>>> A firmware-based submission model isn't a bad design IMO and, aside from
>>> the firmware freedom issues, I think there are actual advantages to the
>>> model. Immediately, it'll unlock a few features like parallel submission
>>> (more on that in a bit) and long-running compute because they're
>>> implemented in GuC and the work to implement them properly in the
>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>> guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
> 
> We need to flesh out that part of the plan more, but we're not going
> to use drm scheduler for everything. It's only to handle the dma-fence
> legacy side of things, which means:
> - timeout handling for batches that take too long
> - dma_fence dependency sorting/handling
> - boosting of context from display flips (currently missing, needs to
> be ported from drm/i915)
> 
> The actual round-robin/preempt/priority handling is still left to the
> backend, in this case here the fw. So there's large chunks of
> code/functionality where drm/scheduler wont be involved in, and like
> Jason says: The hw direction winds definitely blow in the direction
> that this is all handled in hw.

The plan makes sense for a SRIOV-enable GPU, yes.

However, if the GuC is actually helping i915, then why not open source 
it and drop all the issues related to its stability? Wouldn't it be the 
perfect solution, as it would allow dropping execlist support for newer 
HW, and it would eliminate the concerns about maintenance of stable 
releases of Linux?

> 
>>>> In the same vein, I have another concern related to the impact of GuC on
>>>> Linux's stable releases. Let's say that in 3 years, a new application
>>>> triggers a bug in command submission inside the firmware. Given that the
>>>> Linux community cannot patch the GuC firmware, how likely is it that
>>>> Intel would release a new GuC version? That would not be necessarily
>>>> such a big problem if newer versions of the GuC could easily be
>>>> backported to this potentially-decade-old Linux version, but given that
>>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>>> at major version 60 *already*? :o), I would say that it is
>>>> highly-unlikely that it would not require potentially-extensive changes
>>>> to i915 to make it work, making the fix almost impossible to land in the
>>>> stable tree... Do you have a plan to mitigate this problem?
>>>>
>>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>>> submission" also make me twitch, as this means the two command
>>>> submission paths will not be functionally equivalent, and enabling GuC
>>>> could thus introduce a user-visible regression (one app used to work,
>>>> then stopped working). Could you add in the commit's message a proof
>>>> that this would not end up being a user regression (in which case, why
>>>> have this codepath to begin with?).
>>>
>>> I'd like to address this one specifically as it's become something of a
>>> speciality of mine the past few weeks. The current bonded submission
>>> model is bad. It provides a plethora of ways for a client to back itself
>>> into a corner and doesn't actually provide the guarantees the media
>>> driver needs for its real-time high-resolution decode. It's bad enough
>>> we're seriously considering ripping it out, backwards compatibility or
>>> not. The good news is that very little that your average desktop user
>>> does depends on it: basically just real-time >4K video decode.
>>>
>>> The new parallel submit API is much better and should be the path
>>> forward. (We should have landed parallel submit the first time around.)
>>> It isn't full of corners and does let us provides actual parallel
>>> execution guarantees. It also gives the scheduler the information it
>>> needs to reliably provide those guarantees. >
>>> If we need to support the parallel submit API with the execlist
>>> back-end, that's totally possible. The choice to only implement the
>>> parallel submit API with GuC is a pragmatic one. We're trying to get
>>> upstream back on it's feet and get all the various up-and-coming bits of
>>> hardware enabled. Enabling the new API in the execlist back-end makes
>>> that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
> 
> Only if they don't scream loudly enough. If someone screams loud
> enough we'll bite the bullet and enable the new interface on execlist
> backend.

Ack.

Martih

> 
> Cheers, Daniel
> 
>>>> Finally, could you explain why IGT tests need to be modified to work the
>>>> GuC [1], and how much of the code in this series is covered by
>>>> existing/upcoming tests? I would expect a very solid set of tests to
>>>> minimize the maintenance burden, and enable users to reproduce potential
>>>> issues found in this new codepath (too many users run with enable_guc=3,
>>>> as can be seen on Google[2]).
>>>
>>> The IGT changes, as I understand them, are entirely around switching to
>>> the new parallel submit API. There shouldn't be a major effect to most
>>> users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>>>
>>> --Jason
> 
> 
>
Bloomfield, Jon May 11, 2021, 3:26 p.m. UTC | #11
> -----Original Message-----
> From: Martin Peres <martin.peres@free.fr>
> Sent: Tuesday, May 11, 2021 1:06 AM
> To: Daniel Vetter <daniel@ffwll.ch>
> Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
> <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
> <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
> <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> Harrison, John C <john.c.harrison@intel.com>
> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> 
> On 10/05/2021 19:33, Daniel Vetter wrote:
> > On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> wrote:
> >>
> >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> >>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> On 06/05/2021 22:13, Matthew Brost wrote:
> >>>>> Basic GuC submission support. This is the first bullet point in the
> >>>>> upstreaming plan covered in the following RFC [1].
> >>>>>
> >>>>> At a very high level the GuC is a piece of firmware which sits between
> >>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
> >>>>> from the i915 and programs the GPU to submit contexts. The i915
> >>>>> communicates with the GuC and the GuC communicates with the
> GPU.
> >>>>
> >>>> May I ask what will GuC command submission do that execlist
> won't/can't
> >>>> do? And what would be the impact on users? Even forgetting the
> troubled
> >>>> history of GuC (instability, performance regression, poor level of user
> >>>> support, 6+ years of trying to upstream it...), adding this much code
> >>>> and doubling the amount of validation needed should come with a
> >>>> rationale making it feel worth it... and I am not seeing here. Would you
> >>>> mind providing the rationale behind this work?
> >>>>
> >>>>>
> >>>>> GuC submission will be disabled by default on all current upstream
> >>>>> platforms behind a module parameter - enable_guc. A value of 3 will
> >>>>> enable submission and HuC loading via the GuC. GuC submission
> should
> >>>>> work on all gen11+ platforms assuming the GuC firmware is present.
> >>>>
> >>>> What is the plan here when it comes to keeping support for execlist? I
> >>>> am afraid that landing GuC support in Linux is the first step towards
> >>>> killing the execlist, which would force users to use proprietary
> >>>> firmwares that even most Intel engineers have little influence over.
> >>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> scheduling"
> >>>> which states "Disable semaphores when using GuC scheduling as
> semaphores
> >>>> are broken in the current GuC firmware." is anything to go by, it means
> >>>> that even Intel developers seem to prefer working around the GuC
> >>>> firmware, rather than fixing it.
> >>>
> >>> Yes, landing GuC support may be the first step in removing execlist
> >>> support. The inevitable reality is that GPU scheduling is coming and
> >>> likely to be there only path in the not-too-distant future. (See also
> >>> the ongoing thread with AMD about fences.) I'm not going to pass
> >>> judgement on whether or not this is a good thing.  I'm just reading the
> >>> winds and, in my view, this is where things are headed for good or ill.
> >>>
> >>> In answer to the question above, the answer to "what do we gain from
> >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> and,
> >>> again, I'm not necessarily advocating for it, but that is likely where
> >>> things are headed.
> >>
> >> This will be a sad day, especially since it seems fundamentally opposed
> >> with any long-term support, on top of taking away user freedom to
> >> fix/tweak their system when Intel won't.
> >>
> >>> A firmware-based submission model isn't a bad design IMO and, aside
> from
> >>> the firmware freedom issues, I think there are actual advantages to the
> >>> model. Immediately, it'll unlock a few features like parallel submission
> >>> (more on that in a bit) and long-running compute because they're
> >>> implemented in GuC and the work to implement them properly in the
> >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> >>> guarantees) unlock some performance by getting the kernel out of the
> way.
> >>
> >> Oh, I definitely agree with firmware-based submission model not being a
> >> bad design. I was even cheering for it in 2015. Experience with it made
> >> me regret that deeply since :s
> >>
> >> But with the DRM scheduler being responsible for most things, I fail to
> >> see what we could offload in the GuC except context switching (like
> >> every other manufacturer). The problem is, the GuC does way more than
> >> just switching registers in bulk, and if the number of revisions of the
> >> GuC is anything to go by, it is way too complex for me to feel
> >> comfortable with it.
> >
> > We need to flesh out that part of the plan more, but we're not going
> > to use drm scheduler for everything. It's only to handle the dma-fence
> > legacy side of things, which means:
> > - timeout handling for batches that take too long
> > - dma_fence dependency sorting/handling
> > - boosting of context from display flips (currently missing, needs to
> > be ported from drm/i915)
> >
> > The actual round-robin/preempt/priority handling is still left to the
> > backend, in this case here the fw. So there's large chunks of
> > code/functionality where drm/scheduler wont be involved in, and like
> > Jason says: The hw direction winds definitely blow in the direction
> > that this is all handled in hw.
> 
> The plan makes sense for a SRIOV-enable GPU, yes.
> 
> However, if the GuC is actually helping i915, then why not open source
> it and drop all the issues related to its stability? Wouldn't it be the
> perfect solution, as it would allow dropping execlist support for newer
> HW, and it would eliminate the concerns about maintenance of stable
> releases of Linux?

That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.

We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.

Jon
Matthew Brost May 11, 2021, 4:39 p.m. UTC | #12
On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> > -----Original Message-----
> > From: Martin Peres <martin.peres@free.fr>
> > Sent: Tuesday, May 11, 2021 1:06 AM
> > To: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
> > <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> > dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
> > <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
> > Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
> > <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> > Harrison, John C <john.c.harrison@intel.com>
> > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> > 
> > On 10/05/2021 19:33, Daniel Vetter wrote:
> > > On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> > wrote:
> > >>
> > >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> > >>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> On 06/05/2021 22:13, Matthew Brost wrote:
> > >>>>> Basic GuC submission support. This is the first bullet point in the
> > >>>>> upstreaming plan covered in the following RFC [1].
> > >>>>>
> > >>>>> At a very high level the GuC is a piece of firmware which sits between
> > >>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
> > >>>>> from the i915 and programs the GPU to submit contexts. The i915
> > >>>>> communicates with the GuC and the GuC communicates with the
> > GPU.
> > >>>>
> > >>>> May I ask what will GuC command submission do that execlist
> > won't/can't
> > >>>> do? And what would be the impact on users? Even forgetting the
> > troubled
> > >>>> history of GuC (instability, performance regression, poor level of user
> > >>>> support, 6+ years of trying to upstream it...), adding this much code
> > >>>> and doubling the amount of validation needed should come with a
> > >>>> rationale making it feel worth it... and I am not seeing here. Would you
> > >>>> mind providing the rationale behind this work?
> > >>>>
> > >>>>>
> > >>>>> GuC submission will be disabled by default on all current upstream
> > >>>>> platforms behind a module parameter - enable_guc. A value of 3 will
> > >>>>> enable submission and HuC loading via the GuC. GuC submission
> > should
> > >>>>> work on all gen11+ platforms assuming the GuC firmware is present.
> > >>>>
> > >>>> What is the plan here when it comes to keeping support for execlist? I
> > >>>> am afraid that landing GuC support in Linux is the first step towards
> > >>>> killing the execlist, which would force users to use proprietary
> > >>>> firmwares that even most Intel engineers have little influence over.
> > >>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> > scheduling"
> > >>>> which states "Disable semaphores when using GuC scheduling as
> > semaphores
> > >>>> are broken in the current GuC firmware." is anything to go by, it means
> > >>>> that even Intel developers seem to prefer working around the GuC
> > >>>> firmware, rather than fixing it.
> > >>>
> > >>> Yes, landing GuC support may be the first step in removing execlist
> > >>> support. The inevitable reality is that GPU scheduling is coming and
> > >>> likely to be there only path in the not-too-distant future. (See also
> > >>> the ongoing thread with AMD about fences.) I'm not going to pass
> > >>> judgement on whether or not this is a good thing.  I'm just reading the
> > >>> winds and, in my view, this is where things are headed for good or ill.
> > >>>
> > >>> In answer to the question above, the answer to "what do we gain from
> > >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> > and,
> > >>> again, I'm not necessarily advocating for it, but that is likely where
> > >>> things are headed.
> > >>
> > >> This will be a sad day, especially since it seems fundamentally opposed
> > >> with any long-term support, on top of taking away user freedom to
> > >> fix/tweak their system when Intel won't.
> > >>
> > >>> A firmware-based submission model isn't a bad design IMO and, aside
> > from
> > >>> the firmware freedom issues, I think there are actual advantages to the
> > >>> model. Immediately, it'll unlock a few features like parallel submission
> > >>> (more on that in a bit) and long-running compute because they're
> > >>> implemented in GuC and the work to implement them properly in the
> > >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> > >>> guarantees) unlock some performance by getting the kernel out of the
> > way.
> > >>
> > >> Oh, I definitely agree with firmware-based submission model not being a
> > >> bad design. I was even cheering for it in 2015. Experience with it made
> > >> me regret that deeply since :s
> > >>
> > >> But with the DRM scheduler being responsible for most things, I fail to
> > >> see what we could offload in the GuC except context switching (like
> > >> every other manufacturer). The problem is, the GuC does way more than
> > >> just switching registers in bulk, and if the number of revisions of the
> > >> GuC is anything to go by, it is way too complex for me to feel
> > >> comfortable with it.
> > >
> > > We need to flesh out that part of the plan more, but we're not going
> > > to use drm scheduler for everything. It's only to handle the dma-fence
> > > legacy side of things, which means:
> > > - timeout handling for batches that take too long
> > > - dma_fence dependency sorting/handling
> > > - boosting of context from display flips (currently missing, needs to
> > > be ported from drm/i915)
> > >
> > > The actual round-robin/preempt/priority handling is still left to the
> > > backend, in this case here the fw. So there's large chunks of
> > > code/functionality where drm/scheduler wont be involved in, and like
> > > Jason says: The hw direction winds definitely blow in the direction
> > > that this is all handled in hw.
> > 
> > The plan makes sense for a SRIOV-enable GPU, yes.
> > 
> > However, if the GuC is actually helping i915, then why not open source
> > it and drop all the issues related to its stability? Wouldn't it be the
> > perfect solution, as it would allow dropping execlist support for newer
> > HW, and it would eliminate the concerns about maintenance of stable
> > releases of Linux?
> 
> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.
> 

Jon hit the nail on head here - there hasn't been any reason not to bump the GuC
version / change the interface until there is code upstream using the GuC. Once
we push something that totally changes. Once SRIOV lands we literally can't the
interface without breaking the world. Our goal is to this right before
somethings lands, hence the high version number.

Matt

> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.
> 
> Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.
> 
> Jon
Martin Peres May 12, 2021, 6:26 a.m. UTC | #13
On 11/05/2021 19:39, Matthew Brost wrote:
> On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
>>> -----Original Message-----
>>> From: Martin Peres <martin.peres@free.fr>
>>> Sent: Tuesday, May 11, 2021 1:06 AM
>>> To: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Jason Ekstrand <jason@jlekstrand.net>; Brost, Matthew
>>> <matthew.brost@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
>>> dri-devel <dri-devel@lists.freedesktop.org>; Ursulin, Tvrtko
>>> <tvrtko.ursulin@intel.com>; Ekstrand, Jason <jason.ekstrand@intel.com>;
>>> Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Bloomfield, Jon
>>> <jon.bloomfield@intel.com>; Vetter, Daniel <daniel.vetter@intel.com>;
>>> Harrison, John C <john.c.harrison@intel.com>
>>> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
>>>
>>> On 10/05/2021 19:33, Daniel Vetter wrote:
>>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
>>> wrote:
>>>>>
>>>>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>>>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres@free.fr> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>>>>> upstreaming plan covered in the following RFC [1].
>>>>>>>>
>>>>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>>>>> communicates with the GuC and the GuC communicates with the
>>> GPU.
>>>>>>>
>>>>>>> May I ask what will GuC command submission do that execlist
>>> won't/can't
>>>>>>> do? And what would be the impact on users? Even forgetting the
>>> troubled
>>>>>>> history of GuC (instability, performance regression, poor level of user
>>>>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>>>>> and doubling the amount of validation needed should come with a
>>>>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>>>>> mind providing the rationale behind this work?
>>>>>>>
>>>>>>>>
>>>>>>>> GuC submission will be disabled by default on all current upstream
>>>>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>>>>> enable submission and HuC loading via the GuC. GuC submission
>>> should
>>>>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>>>>
>>>>>>> What is the plan here when it comes to keeping support for execlist? I
>>>>>>> am afraid that landing GuC support in Linux is the first step towards
>>>>>>> killing the execlist, which would force users to use proprietary
>>>>>>> firmwares that even most Intel engineers have little influence over.
>>>>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
>>> scheduling"
>>>>>>> which states "Disable semaphores when using GuC scheduling as
>>> semaphores
>>>>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>>>>> that even Intel developers seem to prefer working around the GuC
>>>>>>> firmware, rather than fixing it.
>>>>>>
>>>>>> Yes, landing GuC support may be the first step in removing execlist
>>>>>> support. The inevitable reality is that GPU scheduling is coming and
>>>>>> likely to be there only path in the not-too-distant future. (See also
>>>>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>>>>> judgement on whether or not this is a good thing.  I'm just reading the
>>>>>> winds and, in my view, this is where things are headed for good or ill.
>>>>>>
>>>>>> In answer to the question above, the answer to "what do we gain from
>>>>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
>>> and,
>>>>>> again, I'm not necessarily advocating for it, but that is likely where
>>>>>> things are headed.
>>>>>
>>>>> This will be a sad day, especially since it seems fundamentally opposed
>>>>> with any long-term support, on top of taking away user freedom to
>>>>> fix/tweak their system when Intel won't.
>>>>>
>>>>>> A firmware-based submission model isn't a bad design IMO and, aside
>>> from
>>>>>> the firmware freedom issues, I think there are actual advantages to the
>>>>>> model. Immediately, it'll unlock a few features like parallel submission
>>>>>> (more on that in a bit) and long-running compute because they're
>>>>>> implemented in GuC and the work to implement them properly in the
>>>>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>>>>> guarantees) unlock some performance by getting the kernel out of the
>>> way.
>>>>>
>>>>> Oh, I definitely agree with firmware-based submission model not being a
>>>>> bad design. I was even cheering for it in 2015. Experience with it made
>>>>> me regret that deeply since :s
>>>>>
>>>>> But with the DRM scheduler being responsible for most things, I fail to
>>>>> see what we could offload in the GuC except context switching (like
>>>>> every other manufacturer). The problem is, the GuC does way more than
>>>>> just switching registers in bulk, and if the number of revisions of the
>>>>> GuC is anything to go by, it is way too complex for me to feel
>>>>> comfortable with it.
>>>>
>>>> We need to flesh out that part of the plan more, but we're not going
>>>> to use drm scheduler for everything. It's only to handle the dma-fence
>>>> legacy side of things, which means:
>>>> - timeout handling for batches that take too long
>>>> - dma_fence dependency sorting/handling
>>>> - boosting of context from display flips (currently missing, needs to
>>>> be ported from drm/i915)
>>>>
>>>> The actual round-robin/preempt/priority handling is still left to the
>>>> backend, in this case here the fw. So there's large chunks of
>>>> code/functionality where drm/scheduler wont be involved in, and like
>>>> Jason says: The hw direction winds definitely blow in the direction
>>>> that this is all handled in hw.
>>>
>>> The plan makes sense for a SRIOV-enable GPU, yes.
>>>
>>> However, if the GuC is actually helping i915, then why not open source
>>> it and drop all the issues related to its stability? Wouldn't it be the
>>> perfect solution, as it would allow dropping execlist support for newer
>>> HW, and it would eliminate the concerns about maintenance of stable
>>> releases of Linux?
>>
>> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. 

Of course, where did I say they would?

> Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.

AFAIK, Intel has been shipping GuC to customers since gen9, and upstream 
has been supporting command submission (albeit in a broken form) for 
years... until Michal finally disabled it after I asked for it to a bit 
over 2 years ago[1], when GuC was at major version 32.

So... not sure I would trust your word so blindly here.

[1] 
https://patchwork.freedesktop.org/patch/297997/?series=58760&rev=2#comment_559594
>>
> 
> Jon hit the nail on head here - there hasn't been any reason not to bump the GuC
> version / change the interface until there is code upstream using the GuC. Once
> we push something that totally changes. Once SRIOV lands we literally can't the
> interface without breaking the world. Our goal is to this right before
> somethings lands, hence the high version number.

Good to hear! But Intel will continue to change the interface as new 
generations are made, so what is the support model for older GPUs / 
kernels which will be stuck on older major revisions?

> 
> Matt
> 
>> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

Exactly, the GuC was designed with Windows' GPU model... which is not 
directly applicable to Linux. Also, Windows does not care as much about 
submission latency, whereas most Linux users still depend on glamor for 
2D acceleration which is pretty much the biggest stress test for command 
submission latency. Also, features not used by the Windows driver or 
used in a different way are/will get broken (see the semaphore patch 
that works around it).

>>
>> Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.

I am more than willing to accept the fact that the interface would be 
easier to work with, and I welcome anything that will simplify the 
driver... but not at the expense of regressing the user experience. One 
has to prove more than *just* code maintainability.

Feel free to iterate/land the code, but enabling guc-based command 
submission is waaaaayyyy too early, no matter how much you want it. This 
patch will remain a NACK from me until I see more of the plan to support 
*users* who are willing to use a proprietary firmware, performance 
analysis, what's the plan for users who will not want to use it, and 
what are the capabilities of GuC which could be used for privilege 
escalation and what is done to mitigate that.

Thanks,
Martin

>>
>> Jon
Tvrtko Ursulin May 14, 2021, 11:11 a.m. UTC | #14
On 06/05/2021 20:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.
> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.

Some thoughts mostly relating to future platforms where GuC will be the 
only option, and to some extent platforms where it will be possible to 
turn it on for one reason or another.

Debuggability - in the context of having an upstream way/tool for 
capturing and viewing GuC logs usable for attaching to bug reports.

Currently i915 logs, can provide traces via tracepoints and trace 
printk, and GPU error capture state, which provides often sufficient 
trail of evidence to debug issues.

We need to make sure GuC does is not a black box in this respect. By 
this I mean it does not hide a large portion of the execution flows from 
upstream observability.

This could mean a tool in IGT to access/capture GuC logs and update bug 
filing instructions.

Leading from here is probably the need for the GuC firmware team to 
cross the internal-upstream boundary and deal with such bug reports on 
upstream trackers. Upstream GuC is unlikely to work if we don't have 
such plan and commitment.

Also leading from here is the need for GPU error capture to be on par 
from day one which is I believe still not there in the firmware.

Another, although unrelated, missing feature on my wish list is firmware 
support for wiring up accurate engine busyness stats to i915 PMU. I 
believe this is also being worked on but I don't know when is the 
expected delivery.

If we are tracking a TODO list of items somewhere I think these ones 
should be definitely considered.

Regards,

Tvrtko
Jason Ekstrand May 14, 2021, 4:31 p.m. UTC | #15
Pulling a few threads together...

On Mon, May 10, 2021 at 1:39 PM Francisco Jerez <currojerez@riseup.net> wrote:
>
> I agree with Martin on this.  Given that using GuC currently involves
> making your open-source graphics stack rely on a closed-source
> cryptographically-protected blob in order to submit commands to the GPU,
> and given that it is /still/ possible to use the GPU without it, I'd
> expect some strong material justification for making the switch (like,
> it improves performance of test-case X and Y by Z%, or, we're truly
> sorry but we cannot program your GPU anymore with a purely open-source
> software stack).  Any argument based on the apparent direction of the
> wind doesn't sound like a material engineering reason to me, and runs
> the risk of being self-fulfilling if it leads us to do the worse thing
> for our users just because we have the vague feeling that it is the
> general trend, even though we may have had the means to obtain a better
> compromise for them.

I think it's important to distinguish between landing code to support
GuC submission and requiring it in order to use the GPU.  We've got
the execlist back-end and it's not going anywhere, at least not for
older hardware, and it will likely keep working as long as execlists
remain in the hardware.  What's being proposed here is a new back-end
which, yes, depends on firmware and can be used for more features.

I'm well aware of the slippery slope argument that's implicitly being
used here even if no one is actually saying it:  If we land GuC
support in i915 in any form then Intel HW engineers will say "See,
Linux supports GuC now; we can rip out execlists" and we'll end up in
the dystopia of closed-source firmware.  If upstream continues to push
back on GuC in any form then they'll be forced to keep execlists.
I'll freely admit that there is probably some truth to this.  However,
I really doubt that it's going to work long-term.  If the HW
architects are determined enough to rip it out, they will.

If GuC is really inevitable, then it's in our best interests to land
at least beta support earlier.  There are a lot of questions that
people have brought up around back-ports, dealing with stable kernels,
stability concerns, etc.  The best way to sort those out is to land
the code and start dealing with the issues.  We can't front-load
solving every possible issue or the code will never land.  But maybe
that's people's actual objective?


On Wed, May 12, 2021 at 1:26 AM Martin Peres <martin.peres@free.fr> wrote:
>
> On 11/05/2021 19:39, Matthew Brost wrote:
> > On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> >>> On 10/05/2021 19:33, Daniel Vetter wrote:
> >>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> >>> wrote:
> >>>
> >>> However, if the GuC is actually helping i915, then why not open source
> >>> it and drop all the issues related to its stability? Wouldn't it be the
> >>> perfect solution, as it would allow dropping execlist support for newer
> >>> HW, and it would eliminate the concerns about maintenance of stable
> >>> releases of Linux?

I would like to see that happen.  I know there was some chatter about
it for a while and then the discussions got killed.  I'm not sure what
happened, to be honest.  However, I don't think we can make any
guarantees or assumptions there, I'm afraid. :-(

> >> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway.
>
> Of course, where did I say they would?

I think the concern here is that old kernels will require old major
GuC versions because interfaces won't be backwards-compatible and then
those kernels won't get bug fixes.  That's a legitimate concern.
Given the Linux usage model, I think it's fair to require either
backwards-compatibility with GuC interfaces and validation of that
backwards-compatibility or stable releases with bug fixes for a good
long while.  I honestly can't say whether or not we've really scoped
that.  Jon?

> >> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

There are going to be differences in the usage patterns that i915 and
Windows will hit when it comes to the subtle details of how we bang on
the GuC rings.  Those will likely lead to bugs on Linux that don't
show up on Windows so "it works on Windows" doesn't mean we're headed
for a bug-free future.  It means we have an existence proof that
firmware-based submission can be very reliable.  However, I don't
think anyone on this thread is really questioning that.

> Exactly, the GuC was designed with Windows' GPU model... which is not
> directly applicable to Linux. Also, Windows does not care as much about
> submission latency, whereas most Linux users still depend on glamor for
> 2D acceleration which is pretty much the biggest stress test for command
> submission latency. Also, features not used by the Windows driver or
> used in a different way are/will get broken (see the semaphore patch
> that works around it).

I'm not nearly as deep into benchmarking the delta as you are so I
won't contradict anything said directly.  However, I think it's worth
pointing out a few things:

There isn't really a Windows GPU model.  There's a different
submission model with Win10 vs. Win7 and Linux looks a lot more like
Win7.  I really want Linux to start looking like Win10 at which point
they'll be using roughly the same "GPU model".  There are other OS
differences that matter here such as Windows' substantially higher
interrupt handling latency which GuC theoretically works around.
However, I don't think it's fair to say that the way Linux wants to
program the GPU for command submission is substantially different from
Windows due to userspace software differences.

There are significant differences in terms of dma_fence handling and
implicit synchronization.  However, as has already been mentioned,
those will be handled by drm/scheduler with GuC as a back-end that
manages load-balancing.  And, yes, there will be Linux-specific bugs
(see above) but they're not because of a fundamentally different
model.

One other thing worth mentioning, which doesn't seem to fit anywhere:
If we really care about keeping execlists working for the upcoming
use-cases, it needs major work.  It's currently way too deeply tied
with i915_sw_fence so it can't handle long-running compute batches
without breaking dma-fence rules.  The way it handles bonded submit is
a bolt-on that doesn't actually provide the guarantees that userspace
needs.  It should also probably be re-architected to use drm/scheduler
for dma_fence and look a lot more like GuC on the inside.

The point of bringing this up is that I'm seeing a lot more execlist
love than I think it deserves. :-)  It may be free software but that
doesn't mean it's good software. :-P  To be clear, I don't mean to
unduly insult Chris or any of the other people who have worked on it.
It works and it's perfectly functional for supporting all the good ol'
use-cases us desktop Linux people are used to.  But the ways in which
it would have to change in order to handle the future are substantial.

--Jason
Jason Ekstrand May 14, 2021, 4:36 p.m. UTC | #16
On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> >
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> >
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
>
> Some thoughts mostly relating to future platforms where GuC will be the
> only option, and to some extent platforms where it will be possible to
> turn it on for one reason or another.
>
> Debuggability - in the context of having an upstream way/tool for
> capturing and viewing GuC logs usable for attaching to bug reports.
>
> Currently i915 logs, can provide traces via tracepoints and trace
> printk, and GPU error capture state, which provides often sufficient
> trail of evidence to debug issues.
>
> We need to make sure GuC does is not a black box in this respect. By
> this I mean it does not hide a large portion of the execution flows from
> upstream observability.

I agree here.  If GuC suddenly makes submission issues massively
harder to debug then that's a regression vs. execlists.  I don't know
what the solution there is but I think the concern is valid.

> This could mean a tool in IGT to access/capture GuC logs and update bug
> filing instructions.
>
> Leading from here is probably the need for the GuC firmware team to
> cross the internal-upstream boundary and deal with such bug reports on
> upstream trackers. Upstream GuC is unlikely to work if we don't have
> such plan and commitment.

I mostly agree here as well.  I'm not sure it'll actually happen but
I'd like anyone who writes code which impacts Linux to be active in
upstream bug trackers.

> Also leading from here is the need for GPU error capture to be on par
> from day one which is I believe still not there in the firmware.

This one has me genuinely concerned.  I've heard rumors that we don't
have competent error captures with GuC yet.  From the Mesa PoV, this
is a non-starter.  We can't be asked to develop graphics drivers with
no error capture.

The good news is that, based on my understanding, it shouldn't be
terrible to support.  We just need the GuC to grab all the registers
for us and shove them in a buffer somewhere before it resets the GPU
and all that data is lost.  I would hope the Windows people have
already done that and we just need to hook it up.  If not, there may
be some GuC engineering required here.

> Another, although unrelated, missing feature on my wish list is firmware
> support for wiring up accurate engine busyness stats to i915 PMU. I
> believe this is also being worked on but I don't know when is the
> expected delivery.
>
> If we are tracking a TODO list of items somewhere I think these ones
> should be definitely considered.

Yup, let's get it all in the ToDo and not flip GuC on by default in
the wild until it's all checked off.

--Jason
Matthew Brost May 14, 2021, 4:41 p.m. UTC | #17
On Fri, May 14, 2021 at 12:11:56PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> > 
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> > 
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
> 
> Some thoughts mostly relating to future platforms where GuC will be the only
> option, and to some extent platforms where it will be possible to turn it on
> for one reason or another.
> 
> Debuggability - in the context of having an upstream way/tool for capturing
> and viewing GuC logs usable for attaching to bug reports.
> 

Agree. We have discussed this internally as an upstream requirement for quite
sometime. 

> Currently i915 logs, can provide traces via tracepoints and trace printk,
> and GPU error capture state, which provides often sufficient trail of
> evidence to debug issues.
> 

If we do this right, we should have something the same with GuC submission.

> We need to make sure GuC does is not a black box in this respect. By this I
> mean it does not hide a large portion of the execution flows from upstream
> observability.
> 
> This could mean a tool in IGT to access/capture GuC logs and update bug
> filing instructions.
> 

We have a few internal tools decode the GuC logs. One of these tools will be
open sourced and on a public repo. We just need to decide which tool and make
sure that tool works across all the distros.

> Leading from here is probably the need for the GuC firmware team to cross
> the internal-upstream boundary and deal with such bug reports on upstream
> trackers. Upstream GuC is unlikely to work if we don't have such plan and
> commitment.
> 

I think we can land this code first as it is going be disabled by default.
Certainly once we turn it on by default we need to have everything in place that
you mention in this email.

> Also leading from here is the need for GPU error capture to be on par from
> day one which is I believe still not there in the firmware.
>

We are missing a register dump from the GuC on reset. No other way to say this
than this has been huge miss by the i915 / GuC teams. This is something we
absolutely need and it hasn't gotten done. I'll push on this and hopefully we
can land this feature soon.

> Another, although unrelated, missing feature on my wish list is firmware
> support for wiring up accurate engine busyness stats to i915 PMU. I believe
> this is also being worked on but I don't know when is the expected delivery.
>

This is landing this week I believe. Next upstream post should include an
updated GuC firmware + code in the i915 that hooks into the PMU stats.

Matt

> If we are tracking a TODO list of items somewhere I think these ones should
> be definitely considered.
> 
> Regards,
> 
> Tvrtko
Matthew Brost May 14, 2021, 4:46 p.m. UTC | #18
On Fri, May 14, 2021 at 11:36:37AM -0500, Jason Ekstrand wrote:
> On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> > On 06/05/2021 20:13, Matthew Brost wrote:
> > > Basic GuC submission support. This is the first bullet point in the
> > > upstreaming plan covered in the following RFC [1].
> > >
> > > At a very high level the GuC is a piece of firmware which sits between
> > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > from the i915 and programs the GPU to submit contexts. The i915
> > > communicates with the GuC and the GuC communicates with the GPU.
> > >
> > > GuC submission will be disabled by default on all current upstream
> > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > enable submission and HuC loading via the GuC. GuC submission should
> > > work on all gen11+ platforms assuming the GuC firmware is present.
> >
> > Some thoughts mostly relating to future platforms where GuC will be the
> > only option, and to some extent platforms where it will be possible to
> > turn it on for one reason or another.
> >
> > Debuggability - in the context of having an upstream way/tool for
> > capturing and viewing GuC logs usable for attaching to bug reports.
> >
> > Currently i915 logs, can provide traces via tracepoints and trace
> > printk, and GPU error capture state, which provides often sufficient
> > trail of evidence to debug issues.
> >
> > We need to make sure GuC does is not a black box in this respect. By
> > this I mean it does not hide a large portion of the execution flows from
> > upstream observability.
> 
> I agree here.  If GuC suddenly makes submission issues massively
> harder to debug then that's a regression vs. execlists.  I don't know
> what the solution there is but I think the concern is valid.
> 

Replied to Tvrtko with detailed answers. The TL;DR is agree with basically
everything he said and we have plans address it all and everything must be
addressed before the GuC can be turned on by default.

Matt

> > This could mean a tool in IGT to access/capture GuC logs and update bug
> > filing instructions.
> >
> > Leading from here is probably the need for the GuC firmware team to
> > cross the internal-upstream boundary and deal with such bug reports on
> > upstream trackers. Upstream GuC is unlikely to work if we don't have
> > such plan and commitment.
> 
> I mostly agree here as well.  I'm not sure it'll actually happen but
> I'd like anyone who writes code which impacts Linux to be active in
> upstream bug trackers.
> 
> > Also leading from here is the need for GPU error capture to be on par
> > from day one which is I believe still not there in the firmware.
> 
> This one has me genuinely concerned.  I've heard rumors that we don't
> have competent error captures with GuC yet.  From the Mesa PoV, this
> is a non-starter.  We can't be asked to develop graphics drivers with
> no error capture.
> 
> The good news is that, based on my understanding, it shouldn't be
> terrible to support.  We just need the GuC to grab all the registers
> for us and shove them in a buffer somewhere before it resets the GPU
> and all that data is lost.  I would hope the Windows people have
> already done that and we just need to hook it up.  If not, there may
> be some GuC engineering required here.
> 
> > Another, although unrelated, missing feature on my wish list is firmware
> > support for wiring up accurate engine busyness stats to i915 PMU. I
> > believe this is also being worked on but I don't know when is the
> > expected delivery.
> >
> > If we are tracking a TODO list of items somewhere I think these ones
> > should be definitely considered.
> 
> Yup, let's get it all in the ToDo and not flip GuC on by default in
> the wild until it's all checked off.
> 
> --Jason
Tvrtko Ursulin May 25, 2021, 10:32 a.m. UTC | #19
On 06/05/2021 20:13, Matthew Brost wrote:
> Basic GuC submission support. This is the first bullet point in the
> upstreaming plan covered in the following RFC [1].
> 
> At a very high level the GuC is a piece of firmware which sits between
> the i915 and the GPU. It offloads some of the scheduling of contexts
> from the i915 and programs the GPU to submit contexts. The i915
> communicates with the GuC and the GuC communicates with the GPU.
> 
> GuC submission will be disabled by default on all current upstream
> platforms behind a module parameter - enable_guc. A value of 3 will
> enable submission and HuC loading via the GuC. GuC submission should
> work on all gen11+ platforms assuming the GuC firmware is present.
> 
> This is a huge series and it is completely unrealistic to merge all of
> these patches at once. Fortunately I believe we can break down the
> series into different merges:
> 
> 1. Merge Chris Wilson's patches. These have already been reviewed
> upstream and I fully agree with these patches as a precursor to GuC
> submission.
> 
> 2. Update to GuC 60.1.2. These are largely Michal's patches.
> 
> 3. Turn on GuC/HuC auto mode by default.
> 
> 4. Additional patches needed to support GuC submission. This is any
> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> MMIO'
> 
> 5. GuC submission support. Patches number 35+. These all don't have to
> merge at once though as we don't actually allow GuC submission until the
> last patch of this series.

For the GuC backend/submission part only - it seems to me none of my 
review comments I made in December 2019 have been implemented. At that 
point I stated, and this was all internally at the time mind you, that I 
do not think the series is ready and there were several high level 
issues that would need to be sorted out. I don't think I gave my ack or 
r-b back then and the promise was a few things would be worked on post 
(internal) merge. That was supposed to include upstream refactoring to 
enable GuC better slotting in as a backed. Fast forward a year and a 
half later and the only progress we had in this area has been deleted.

 From the top of my head, and having glanced the series as posted:

  * Self-churn factor in the series is too high.
  * Patch ordering issues.
  * GuC context state machine is way too dodgy to have any confidence it 
can be read and race conditions understood.
  * Context pinning code with it's magical two adds, subtract and 
cmpxchg is dodgy as well.
  * Kludgy way of interfacing with rest of the driver instead of 
refactoring to fit (idling, breadcrumbs, scheduler, tasklets, ...).

Now perhaps the latest plan is to ignore all these issues and still 
merge, then follow up with throwing it away, mostly or at least largely, 
in which case there isn't any point really to review the current state 
yet again. But it is sad that we got to this state. So just for the 
record - all this was reviewed in Nov/Dec 2019. By me among other folks 
and I at least deemed it not ready in this form.

Regards,

Tvrtko
Alex Deucher May 25, 2021, 3:37 p.m. UTC | #20
On Fri, May 14, 2021 at 12:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> Pulling a few threads together...
>
> On Mon, May 10, 2021 at 1:39 PM Francisco Jerez <currojerez@riseup.net> wrote:
> >
> > I agree with Martin on this.  Given that using GuC currently involves
> > making your open-source graphics stack rely on a closed-source
> > cryptographically-protected blob in order to submit commands to the GPU,
> > and given that it is /still/ possible to use the GPU without it, I'd
> > expect some strong material justification for making the switch (like,
> > it improves performance of test-case X and Y by Z%, or, we're truly
> > sorry but we cannot program your GPU anymore with a purely open-source
> > software stack).  Any argument based on the apparent direction of the
> > wind doesn't sound like a material engineering reason to me, and runs
> > the risk of being self-fulfilling if it leads us to do the worse thing
> > for our users just because we have the vague feeling that it is the
> > general trend, even though we may have had the means to obtain a better
> > compromise for them.
>
> I think it's important to distinguish between landing code to support
> GuC submission and requiring it in order to use the GPU.  We've got
> the execlist back-end and it's not going anywhere, at least not for
> older hardware, and it will likely keep working as long as execlists
> remain in the hardware.  What's being proposed here is a new back-end
> which, yes, depends on firmware and can be used for more features.
>
> I'm well aware of the slippery slope argument that's implicitly being
> used here even if no one is actually saying it:  If we land GuC
> support in i915 in any form then Intel HW engineers will say "See,
> Linux supports GuC now; we can rip out execlists" and we'll end up in
> the dystopia of closed-source firmware.  If upstream continues to push
> back on GuC in any form then they'll be forced to keep execlists.
> I'll freely admit that there is probably some truth to this.  However,
> I really doubt that it's going to work long-term.  If the HW
> architects are determined enough to rip it out, they will.

You want to stay on the same interfaces as Windows does, like it or
not.  The market is bigger and there is a lot more validation effort.
Even if support for the old way doesn't go away, it won't be as well
tested.  For AMD, we tried to stay on some of the older interfaces on
a number products in the past and ran into lots of subtle issues,
especially around power management related things like clock and power
gating.  There are just too many handshakes and stuff required to make
all of that work smoothly.  It can be especially challenging when the
issues show up well after launch and the firmware and hardware teams
have already moved on to the next projects and have to page the older
projects back into their minds.

Alex


>
> If GuC is really inevitable, then it's in our best interests to land
> at least beta support earlier.  There are a lot of questions that
> people have brought up around back-ports, dealing with stable kernels,
> stability concerns, etc.  The best way to sort those out is to land
> the code and start dealing with the issues.  We can't front-load
> solving every possible issue or the code will never land.  But maybe
> that's people's actual objective?
>
>
> On Wed, May 12, 2021 at 1:26 AM Martin Peres <martin.peres@free.fr> wrote:
> >
> > On 11/05/2021 19:39, Matthew Brost wrote:
> > > On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> > >>> On 10/05/2021 19:33, Daniel Vetter wrote:
> > >>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres@free.fr>
> > >>> wrote:
> > >>>
> > >>> However, if the GuC is actually helping i915, then why not open source
> > >>> it and drop all the issues related to its stability? Wouldn't it be the
> > >>> perfect solution, as it would allow dropping execlist support for newer
> > >>> HW, and it would eliminate the concerns about maintenance of stable
> > >>> releases of Linux?
>
> I would like to see that happen.  I know there was some chatter about
> it for a while and then the discussions got killed.  I'm not sure what
> happened, to be honest.  However, I don't think we can make any
> guarantees or assumptions there, I'm afraid. :-(
>
> > >> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway.
> >
> > Of course, where did I say they would?
>
> I think the concern here is that old kernels will require old major
> GuC versions because interfaces won't be backwards-compatible and then
> those kernels won't get bug fixes.  That's a legitimate concern.
> Given the Linux usage model, I think it's fair to require either
> backwards-compatibility with GuC interfaces and validation of that
> backwards-compatibility or stable releases with bug fixes for a good
> long while.  I honestly can't say whether or not we've really scoped
> that.  Jon?
>
> > >> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.
>
> There are going to be differences in the usage patterns that i915 and
> Windows will hit when it comes to the subtle details of how we bang on
> the GuC rings.  Those will likely lead to bugs on Linux that don't
> show up on Windows so "it works on Windows" doesn't mean we're headed
> for a bug-free future.  It means we have an existence proof that
> firmware-based submission can be very reliable.  However, I don't
> think anyone on this thread is really questioning that.
>
> > Exactly, the GuC was designed with Windows' GPU model... which is not
> > directly applicable to Linux. Also, Windows does not care as much about
> > submission latency, whereas most Linux users still depend on glamor for
> > 2D acceleration which is pretty much the biggest stress test for command
> > submission latency. Also, features not used by the Windows driver or
> > used in a different way are/will get broken (see the semaphore patch
> > that works around it).
>
> I'm not nearly as deep into benchmarking the delta as you are so I
> won't contradict anything said directly.  However, I think it's worth
> pointing out a few things:
>
> There isn't really a Windows GPU model.  There's a different
> submission model with Win10 vs. Win7 and Linux looks a lot more like
> Win7.  I really want Linux to start looking like Win10 at which point
> they'll be using roughly the same "GPU model".  There are other OS
> differences that matter here such as Windows' substantially higher
> interrupt handling latency which GuC theoretically works around.
> However, I don't think it's fair to say that the way Linux wants to
> program the GPU for command submission is substantially different from
> Windows due to userspace software differences.
>
> There are significant differences in terms of dma_fence handling and
> implicit synchronization.  However, as has already been mentioned,
> those will be handled by drm/scheduler with GuC as a back-end that
> manages load-balancing.  And, yes, there will be Linux-specific bugs
> (see above) but they're not because of a fundamentally different
> model.
>
> One other thing worth mentioning, which doesn't seem to fit anywhere:
> If we really care about keeping execlists working for the upcoming
> use-cases, it needs major work.  It's currently way too deeply tied
> with i915_sw_fence so it can't handle long-running compute batches
> without breaking dma-fence rules.  The way it handles bonded submit is
> a bolt-on that doesn't actually provide the guarantees that userspace
> needs.  It should also probably be re-architected to use drm/scheduler
> for dma_fence and look a lot more like GuC on the inside.
>
> The point of bringing this up is that I'm seeing a lot more execlist
> love than I think it deserves. :-)  It may be free software but that
> doesn't mean it's good software. :-P  To be clear, I don't mean to
> unduly insult Chris or any of the other people who have worked on it.
> It works and it's perfectly functional for supporting all the good ol'
> use-cases us desktop Linux people are used to.  But the ways in which
> it would have to change in order to handle the future are substantial.
>
> --Jason
Matthew Brost May 25, 2021, 4:45 p.m. UTC | #21
On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> > 
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the GPU.
> > 
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission should
> > work on all gen11+ platforms assuming the GuC firmware is present.
> > 
> > This is a huge series and it is completely unrealistic to merge all of
> > these patches at once. Fortunately I believe we can break down the
> > series into different merges:
> > 
> > 1. Merge Chris Wilson's patches. These have already been reviewed
> > upstream and I fully agree with these patches as a precursor to GuC
> > submission.
> > 
> > 2. Update to GuC 60.1.2. These are largely Michal's patches.
> > 
> > 3. Turn on GuC/HuC auto mode by default.
> > 
> > 4. Additional patches needed to support GuC submission. This is any
> > patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> > MMIO'
> > 
> > 5. GuC submission support. Patches number 35+. These all don't have to
> > merge at once though as we don't actually allow GuC submission until the
> > last patch of this series.
> 
> For the GuC backend/submission part only - it seems to me none of my review
> comments I made in December 2019 have been implemented. At that point I

I wouldn't say none of the fixes have done, lots have just not
everything you wanted.

> stated, and this was all internally at the time mind you, that I do not
> think the series is ready and there were several high level issues that
> would need to be sorted out. I don't think I gave my ack or r-b back then
> and the promise was a few things would be worked on post (internal) merge.
> That was supposed to include upstream refactoring to enable GuC better
> slotting in as a backed. Fast forward a year and a half later and the only
> progress we had in this area has been deleted.
> 
> From the top of my head, and having glanced the series as posted:
> 
>  * Self-churn factor in the series is too high.

Not sure what you mean by this? The patches have been reworked
internally too much?

>  * Patch ordering issues.

We are going to clean up some of the ordering as these 97 patches are
posted in smaller mergeable series but at the end of the day this is a
bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
it really isn't all that big of a deal the order of which patches before
that land as we are not breaking anything.

>  * GuC context state machine is way too dodgy to have any confidence it can
> be read and race conditions understood.

I know you don't really like the state machine but no other real way to
not have DoS on resources and no real way to fairly distribute guc_ids
without it. I know you have had other suggestions here but none of your
suggestions either will work or they are no less complicated in the end.

For what it is worth, the state machine will get simplified when we hook
into the DRM scheduler as won't have to deal with submitting from IRQ
contexts in the backend or having more than 1 request in the backend at
a time.

>  * Context pinning code with it's magical two adds, subtract and cmpxchg is
> dodgy as well.

Daniele tried to remove this and it proved quite difficult + created
even more races in the backend code. This was prior to the pre-pin and
post-unpin code which makes this even more difficult to fix as I believe
these functions would need to be removed first. Not saying we can't
revisit this someday but I personally really like it - it is a clever
way to avoid reentering the pin / unpin code while asynchronous things
are happening rather than some complex locking scheme. Lastly, this code
has proved incredibly stable as I don't think we've had to fix a single
thing in this area since we've been using this code internally.

>  * Kludgy way of interfacing with rest of the driver instead of refactoring
> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>

Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
away once the DRM scheduler lands. No need rework those as we are just
going to rework this again.
 
> Now perhaps the latest plan is to ignore all these issues and still merge,
> then follow up with throwing it away, mostly or at least largely, in which
> case there isn't any point really to review the current state yet again. But
> it is sad that we got to this state. So just for the record - all this was
> reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
> not ready in this form.
> 

I personally don't think it is really in that bad of shape. The fact
that I could put together a PoC more or less fully integrating this
backend into the DRM scheduler within a few days I think speaks to the
quality and flexablitiy of this backend compared to execlists.

Matt 

> Regards,
> 
> Tvrtko
Tvrtko Ursulin June 2, 2021, 3:27 p.m. UTC | #22
On 25/05/2021 17:45, Matthew Brost wrote:
> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:13, Matthew Brost wrote:
>>> Basic GuC submission support. This is the first bullet point in the
>>> upstreaming plan covered in the following RFC [1].
>>>
>>> At a very high level the GuC is a piece of firmware which sits between
>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>> from the i915 and programs the GPU to submit contexts. The i915
>>> communicates with the GuC and the GuC communicates with the GPU.
>>>
>>> GuC submission will be disabled by default on all current upstream
>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>> enable submission and HuC loading via the GuC. GuC submission should
>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>
>>> This is a huge series and it is completely unrealistic to merge all of
>>> these patches at once. Fortunately I believe we can break down the
>>> series into different merges:
>>>
>>> 1. Merge Chris Wilson's patches. These have already been reviewed
>>> upstream and I fully agree with these patches as a precursor to GuC
>>> submission.
>>>
>>> 2. Update to GuC 60.1.2. These are largely Michal's patches.
>>>
>>> 3. Turn on GuC/HuC auto mode by default.
>>>
>>> 4. Additional patches needed to support GuC submission. This is any
>>> patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
>>> MMIO'
>>>
>>> 5. GuC submission support. Patches number 35+. These all don't have to
>>> merge at once though as we don't actually allow GuC submission until the
>>> last patch of this series.
>>
>> For the GuC backend/submission part only - it seems to me none of my review
>> comments I made in December 2019 have been implemented. At that point I
> 
> I wouldn't say none of the fixes have done, lots have just not
> everything you wanted.
> 
>> stated, and this was all internally at the time mind you, that I do not
>> think the series is ready and there were several high level issues that
>> would need to be sorted out. I don't think I gave my ack or r-b back then
>> and the promise was a few things would be worked on post (internal) merge.
>> That was supposed to include upstream refactoring to enable GuC better
>> slotting in as a backed. Fast forward a year and a half later and the only
>> progress we had in this area has been deleted.
>>
>>  From the top of my head, and having glanced the series as posted:
>>
>>   * Self-churn factor in the series is too high.
> 
> Not sure what you mean by this? The patches have been reworked
> internally too much?

No, I meant series adds and removes, changes the same code a bit much 
which makes it harder to review. It is much easier when the flow is 
logical and typical, where it starts with refactoring, generalising, 
building infrastructure and then plugging bits in, than it is to review 
patches which add stuff which then get removed or changed significantly 
a few patches down the line.

>>   * Patch ordering issues.
> 
> We are going to clean up some of the ordering as these 97 patches are
> posted in smaller mergeable series but at the end of the day this is a
> bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
> it really isn't all that big of a deal the order of which patches before
> that land as we are not breaking anything.

Yes some leeway for ordering is fine.

>>   * GuC context state machine is way too dodgy to have any confidence it can
>> be read and race conditions understood.
> 
> I know you don't really like the state machine but no other real way to
> not have DoS on resources and no real way to fairly distribute guc_ids
> without it. I know you have had other suggestions here but none of your
> suggestions either will work or they are no less complicated in the end.
> 
> For what it is worth, the state machine will get simplified when we hook
> into the DRM scheduler as won't have to deal with submitting from IRQ
> contexts in the backend or having more than 1 request in the backend at
> a time.

Dunno. A mix of self-churn, locks, inconsistent naming, verbosity and 
magic makes it super hard to review. States in functions like 
guc_context_ban, guc_context_sched_disable, guc_context_block, .. I find 
it impossible to follow what's going on. Some under lock, some outside, 
jumps, returns, add magic two .. Perhaps it is just me so wait and see 
what other reviewers will think.

>>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
>> dodgy as well.
> 
> Daniele tried to remove this and it proved quite difficult + created
> even more races in the backend code. This was prior to the pre-pin and
> post-unpin code which makes this even more difficult to fix as I believe
> these functions would need to be removed first. Not saying we can't
> revisit this someday but I personally really like it - it is a clever
> way to avoid reentering the pin / unpin code while asynchronous things
> are happening rather than some complex locking scheme. Lastly, this code
> has proved incredibly stable as I don't think we've had to fix a single
> thing in this area since we've been using this code internally.

Pretty much same as above. The code like:

static inline void __intel_context_unpin(struct intel_context *ce)
{
	if (!ce->ops->sched_disable) {
		__intel_context_do_unpin(ce, 1);
	} else {
		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
				ce->ops->sched_disable(ce);
				break;
			}
		}
	}
}

That's pretty much impenetrable for me and the only thing I can think of 
here is **ALARM** must be broken! See what others think..

>>   * Kludgy way of interfacing with rest of the driver instead of refactoring
>> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>>
> 
> Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> away once the DRM scheduler lands. No need rework those as we are just
> going to rework this again.

Well today I read the breadcrumbs patch and there is no way that's 
clean. It goes and creates one object per engine, then deletes them, 
replacing with GuC special one. All in the same engine setup. The same 
pattern of bolting on the GuC repeats too much for my taste.

>> Now perhaps the latest plan is to ignore all these issues and still merge,
>> then follow up with throwing it away, mostly or at least largely, in which
>> case there isn't any point really to review the current state yet again. But
>> it is sad that we got to this state. So just for the record - all this was
>> reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
>> not ready in this form.
>>
> 
> I personally don't think it is really in that bad of shape. The fact
> that I could put together a PoC more or less fully integrating this
> backend into the DRM scheduler within a few days I think speaks to the
> quality and flexablitiy of this backend compared to execlists.

Or that you are much more familiar with it. Anyway, it's not the line of 
argument I think we should continue.

Regards,

Tvrtko
Daniel Vetter June 2, 2021, 6:57 p.m. UTC | #23
On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 25/05/2021 17:45, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> >> dodgy as well.
> >
> > Daniele tried to remove this and it proved quite difficult + created
> > even more races in the backend code. This was prior to the pre-pin and
> > post-unpin code which makes this even more difficult to fix as I believe
> > these functions would need to be removed first. Not saying we can't
> > revisit this someday but I personally really like it - it is a clever
> > way to avoid reentering the pin / unpin code while asynchronous things
> > are happening rather than some complex locking scheme. Lastly, this code
> > has proved incredibly stable as I don't think we've had to fix a single
> > thing in this area since we've been using this code internally.
>
> Pretty much same as above. The code like:
>
> static inline void __intel_context_unpin(struct intel_context *ce)
> {
>         if (!ce->ops->sched_disable) {
>                 __intel_context_do_unpin(ce, 1);
>         } else {
>                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>                                 ce->ops->sched_disable(ce);
>                                 break;
>                         }
>                 }
>         }
> }
>
> That's pretty much impenetrable for me and the only thing I can think of
> here is **ALARM** must be broken! See what others think..

pin_count is a hand-rolled mutex, except not actually a real one, and
it's absolutely hiliarous in it's various incarnations (there's one
each on i915_vm, vma, obj and probably a few more).

Not the worst one I've seen by far in the code we've merged already.
Minimally this needs a comment here and in the struct next to
@pin_count to explain where all this is abused, which would already
make it better than most of the in-tree ones.

As part of the ttm conversion we have a plan to sunset the "pin_count
as a lock" stuff, depending how bad that goes we might need to split
up the task for each struct that has such a pin_count.

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Matthew Brost June 3, 2021, 3:41 a.m. UTC | #24
On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> > On 25/05/2021 17:45, Matthew Brost wrote:
> > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > >> dodgy as well.
> > >
> > > Daniele tried to remove this and it proved quite difficult + created
> > > even more races in the backend code. This was prior to the pre-pin and
> > > post-unpin code which makes this even more difficult to fix as I believe
> > > these functions would need to be removed first. Not saying we can't
> > > revisit this someday but I personally really like it - it is a clever
> > > way to avoid reentering the pin / unpin code while asynchronous things
> > > are happening rather than some complex locking scheme. Lastly, this code
> > > has proved incredibly stable as I don't think we've had to fix a single
> > > thing in this area since we've been using this code internally.
> >
> > Pretty much same as above. The code like:
> >
> > static inline void __intel_context_unpin(struct intel_context *ce)
> > {
> >         if (!ce->ops->sched_disable) {
> >                 __intel_context_do_unpin(ce, 1);
> >         } else {
> >                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> >                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> >                                 ce->ops->sched_disable(ce);
> >                                 break;
> >                         }
> >                 }
> >         }
> > }
> >
> > That's pretty much impenetrable for me and the only thing I can think of
> > here is **ALARM** must be broken! See what others think..

Yea, probably should add a comment:

/*
 * If the context has the sched_disable function, it isn't safe to unpin
 * until this function completes. This function is allowed to complete
 * asynchronously too. To avoid this function from being entered twice
 * and move ownership of the unpin to this function's completion, adjust
 * the pin count to 2 before it is entered. When this function completes
 * the context can call intel_context_sched_unpin which decrements the
 * pin count by 2 potentially resulting in an unpin.
 *
 * A while loop is needed to ensure the atomicity of the pin count. e.g.
 * The below if / else statement has a race:
 * 
 * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
 * 	ce->ops->sched_disable(ce);
 * else
 * 	atomic_dec(ce, 1);
 *
 * Two threads could simultaneously fail the if clause resulting in the
 * pin_count going to 0 with scheduling enabled + the context pinned. 
 */

> 
> pin_count is a hand-rolled mutex, except not actually a real one, and
> it's absolutely hiliarous in it's various incarnations (there's one
> each on i915_vm, vma, obj and probably a few more).
>
> Not the worst one I've seen by far in the code we've merged already.
> Minimally this needs a comment here and in the struct next to
> @pin_count to explain where all this is abused, which would already
> make it better than most of the in-tree ones.
> 
> As part of the ttm conversion we have a plan to sunset the "pin_count
> as a lock" stuff, depending how bad that goes we might need to split
> up the task for each struct that has such a pin_count.
>

Didn't know that with the TTM rework this value might go away. If that
is truely the direction I don't see the point in reworking this now. It
100% works and with a comment I think it can be understood what it is
doing.

Matt

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Matthew Brost June 3, 2021, 4:10 a.m. UTC | #25
On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 17:45, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:13, Matthew Brost wrote:
> > > > Basic GuC submission support. This is the first bullet point in the
> > > > upstreaming plan covered in the following RFC [1].
> > > > 
> > > > At a very high level the GuC is a piece of firmware which sits between
> > > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > > from the i915 and programs the GPU to submit contexts. The i915
> > > > communicates with the GuC and the GuC communicates with the GPU.
> > > > 
> > > > GuC submission will be disabled by default on all current upstream
> > > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > > enable submission and HuC loading via the GuC. GuC submission should
> > > > work on all gen11+ platforms assuming the GuC firmware is present.
> > > > 
> > > > This is a huge series and it is completely unrealistic to merge all of
> > > > these patches at once. Fortunately I believe we can break down the
> > > > series into different merges:
> > > > 
> > > > 1. Merge Chris Wilson's patches. These have already been reviewed
> > > > upstream and I fully agree with these patches as a precursor to GuC
> > > > submission.
> > > > 
> > > > 2. Update to GuC 60.1.2. These are largely Michal's patches.
> > > > 
> > > > 3. Turn on GuC/HuC auto mode by default.
> > > > 
> > > > 4. Additional patches needed to support GuC submission. This is any
> > > > patch not covered by 1-3 in the first 34 patches. e.g. 'Engine relative
> > > > MMIO'
> > > > 
> > > > 5. GuC submission support. Patches number 35+. These all don't have to
> > > > merge at once though as we don't actually allow GuC submission until the
> > > > last patch of this series.
> > > 
> > > For the GuC backend/submission part only - it seems to me none of my review
> > > comments I made in December 2019 have been implemented. At that point I
> > 
> > I wouldn't say none of the fixes have done, lots have just not
> > everything you wanted.
> > 
> > > stated, and this was all internally at the time mind you, that I do not
> > > think the series is ready and there were several high level issues that
> > > would need to be sorted out. I don't think I gave my ack or r-b back then
> > > and the promise was a few things would be worked on post (internal) merge.
> > > That was supposed to include upstream refactoring to enable GuC better
> > > slotting in as a backed. Fast forward a year and a half later and the only
> > > progress we had in this area has been deleted.
> > > 
> > >  From the top of my head, and having glanced the series as posted:
> > > 
> > >   * Self-churn factor in the series is too high.
> > 
> > Not sure what you mean by this? The patches have been reworked
> > internally too much?
> 
> No, I meant series adds and removes, changes the same code a bit much which
> makes it harder to review. It is much easier when the flow is logical and
> typical, where it starts with refactoring, generalising, building
> infrastructure and then plugging bits in, than it is to review patches which
> add stuff which then get removed or changed significantly a few patches down
> the line.
>

This has been part of the internal churn but most of this should go
away as it gets posted / merged in smaller sets of patches.
 
> > >   * Patch ordering issues.
> > 
> > We are going to clean up some of the ordering as these 97 patches are
> > posted in smaller mergeable series but at the end of the day this is a
> > bit of a bikeshed. GuC submission can't be turned until patch 97 so IMO
> > it really isn't all that big of a deal the order of which patches before
> > that land as we are not breaking anything.
> 
> Yes some leeway for ordering is fine.
> 
> > >   * GuC context state machine is way too dodgy to have any confidence it can
> > > be read and race conditions understood.
> > 
> > I know you don't really like the state machine but no other real way to
> > not have DoS on resources and no real way to fairly distribute guc_ids
> > without it. I know you have had other suggestions here but none of your
> > suggestions either will work or they are no less complicated in the end.
> > 
> > For what it is worth, the state machine will get simplified when we hook
> > into the DRM scheduler as won't have to deal with submitting from IRQ
> > contexts in the backend or having more than 1 request in the backend at
> > a time.
> 
> Dunno. A mix of self-churn, locks, inconsistent naming, verbosity and magic
> makes it super hard to review. States in functions like guc_context_ban,
> guc_context_sched_disable, guc_context_block, .. I find it impossible to
> follow what's going on. Some under lock, some outside, jumps, returns, add
> magic two .. Perhaps it is just me so wait and see what other reviewers will
> think.
> 

No doubt it is a bit complex as all of the above function can be
executing at the same time, so can a reset, so can a submission, and the
GuC is also responding to the all of the above functions asynchronously.
When you have 6 things that can be operating on the same state, yes the
locking is going to be a bit confusing. I do write documentation in a
patch towards the end of this series explaining the locking rules + all
the races.

> > >   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > > dodgy as well.
> > 
> > Daniele tried to remove this and it proved quite difficult + created
> > even more races in the backend code. This was prior to the pre-pin and
> > post-unpin code which makes this even more difficult to fix as I believe
> > these functions would need to be removed first. Not saying we can't
> > revisit this someday but I personally really like it - it is a clever
> > way to avoid reentering the pin / unpin code while asynchronous things
> > are happening rather than some complex locking scheme. Lastly, this code
> > has proved incredibly stable as I don't think we've had to fix a single
> > thing in this area since we've been using this code internally.
> 
> Pretty much same as above. The code like:
> 
> static inline void __intel_context_unpin(struct intel_context *ce)
> {
> 	if (!ce->ops->sched_disable) {
> 		__intel_context_do_unpin(ce, 1);
> 	} else {
> 		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> 			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> 				ce->ops->sched_disable(ce);
> 				break;
> 			}
> 		}
> 	}
> }
> 
> That's pretty much impenetrable for me and the only thing I can think of
> here is **ALARM** must be broken! See what others think..
> 

Answered in a reply to Daniel's reply but I'll repeat. Should have a
comment here:

/*
 * If the context has the sched_disable function, it isn't safe to unpin
 * until this function completes. This function is allowed to complete
 * asynchronously too. To avoid this function from being entered twice
 * and move ownership of the unpin to this function's completion, adjust
 * the pin count to 2 before it is entered. When this function completes
 * the context can call intel_context_sched_unpin which decrements the
 * pin count by 2 potentially resulting in an unpin.
 *
 * A while loop is needed to ensure the atomicity of the pin count. e.g.
 * The below if / else statement has a race:
 *
 * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
 * 	ce->ops->sched_disable(ce);
 * else
 * 	atomic_dec(ce, 1);
 *
 * Two threads could simultaneously fail the if clause resulting in the
 * pin_count going to 0 with scheduling enabled + the context pinned. 
 */

> > >   * Kludgy way of interfacing with rest of the driver instead of refactoring
> > > to fit (idling, breadcrumbs, scheduler, tasklets, ...).
> > > 
> > 
> > Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> > away once the DRM scheduler lands. No need rework those as we are just
> > going to rework this again.
> 
> Well today I read the breadcrumbs patch and there is no way that's clean. It
> goes and creates one object per engine, then deletes them, replacing with
> GuC special one. All in the same engine setup. The same pattern of bolting
> on the GuC repeats too much for my taste.
> 

I don't think creating a default object /w a ref count then decrementing
the ref count + replacing it with a new object is that hard to
understand. IMO that is way better than how things worked previously
where we just made implicit assumptions all over the driver of the
execlists backend behavior. If this was done properly in the current
i915 code base this really wouldn't be an issue.

> > > Now perhaps the latest plan is to ignore all these issues and still merge,
> > > then follow up with throwing it away, mostly or at least largely, in which
> > > case there isn't any point really to review the current state yet again. But
> > > it is sad that we got to this state. So just for the record - all this was
> > > reviewed in Nov/Dec 2019. By me among other folks and I at least deemed it
> > > not ready in this form.
> > > 
> > 
> > I personally don't think it is really in that bad of shape. The fact
> > that I could put together a PoC more or less fully integrating this
> > backend into the DRM scheduler within a few days I think speaks to the
> > quality and flexablitiy of this backend compared to execlists.
> 
> Or that you are much more familiar with it. Anyway, it's not the line of
> argument I think we should continue.
>

Yes, obviously more familiar with this code but I think the argument
holds when it relates to DRM scheduler. Please someone who is familiar
with the execlists backend try to integrate that with the DRM scheduler
- I guarantee it will be a nightmare / total hack job.

Matt

> Regards,
> 
> Tvrtko
Daniel Vetter June 3, 2021, 4:47 a.m. UTC | #26
On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > > On 25/05/2021 17:45, Matthew Brost wrote:
> > > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > > >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > > >> dodgy as well.
> > > >
> > > > Daniele tried to remove this and it proved quite difficult + created
> > > > even more races in the backend code. This was prior to the pre-pin and
> > > > post-unpin code which makes this even more difficult to fix as I believe
> > > > these functions would need to be removed first. Not saying we can't
> > > > revisit this someday but I personally really like it - it is a clever
> > > > way to avoid reentering the pin / unpin code while asynchronous things
> > > > are happening rather than some complex locking scheme. Lastly, this code
> > > > has proved incredibly stable as I don't think we've had to fix a single
> > > > thing in this area since we've been using this code internally.
> > >
> > > Pretty much same as above. The code like:
> > >
> > > static inline void __intel_context_unpin(struct intel_context *ce)
> > > {
> > >         if (!ce->ops->sched_disable) {
> > >                 __intel_context_do_unpin(ce, 1);
> > >         } else {
> > >                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> > >                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> > >                                 ce->ops->sched_disable(ce);
> > >                                 break;
> > >                         }
> > >                 }
> > >         }
> > > }
> > >
> > > That's pretty much impenetrable for me and the only thing I can think of
> > > here is **ALARM** must be broken! See what others think..
>
> Yea, probably should add a comment:
>
> /*
>  * If the context has the sched_disable function, it isn't safe to unpin
>  * until this function completes. This function is allowed to complete
>  * asynchronously too. To avoid this function from being entered twice
>  * and move ownership of the unpin to this function's completion, adjust
>  * the pin count to 2 before it is entered. When this function completes
>  * the context can call intel_context_sched_unpin which decrements the
>  * pin count by 2 potentially resulting in an unpin.
>  *
>  * A while loop is needed to ensure the atomicity of the pin count. e.g.
>  * The below if / else statement has a race:
>  *
>  * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>  *      ce->ops->sched_disable(ce);
>  * else
>  *      atomic_dec(ce, 1);
>  *
>  * Two threads could simultaneously fail the if clause resulting in the
>  * pin_count going to 0 with scheduling enabled + the context pinned.
>  */
>
> >
> > pin_count is a hand-rolled mutex, except not actually a real one, and
> > it's absolutely hiliarous in it's various incarnations (there's one
> > each on i915_vm, vma, obj and probably a few more).
> >
> > Not the worst one I've seen by far in the code we've merged already.
> > Minimally this needs a comment here and in the struct next to
> > @pin_count to explain where all this is abused, which would already
> > make it better than most of the in-tree ones.
> >
> > As part of the ttm conversion we have a plan to sunset the "pin_count
> > as a lock" stuff, depending how bad that goes we might need to split
> > up the task for each struct that has such a pin_count.
> >
>
> Didn't know that with the TTM rework this value might go away. If that
> is truely the direction I don't see the point in reworking this now. It
> 100% works and with a comment I think it can be understood what it is
> doing.

Well not go away, but things will change. Currently the various
->pin_count sprinkled all over the place have essentially two uses
- pinning stuff long term (scanout, ctxs, anything that stays pinned
after the ioctl is done essentially)
- short-term lock-like construct

There's going to be two changes:
- The short-term pins will be replaced by dma_resv_lock/unlock pairs
- the locking rules for long-term pins will change, because we'll
require that you must hold dma_resv_lock for unpinning. So no more
atomic_t, also no more races for final unpins vs cleanup work

Also now that you've explained the why for this dance, especially the
async part: Since the new unpin will hold dma_resv_lock, we can
create&attach dma_fence for tracking async completion which then the
next operation can wait on.

The awkward state we have right now is that there's a lot of places
where we require the unpin to be done locklessly with these atomic
tricks, so there's going to be quite some surgery involved all over
the code.
-Daniel
Tvrtko Ursulin June 3, 2021, 8:51 a.m. UTC | #27
On 03/06/2021 05:10, Matthew Brost wrote:
> On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
>>
>> On 25/05/2021 17:45, Matthew Brost wrote:

[snip]

>>>>    * Kludgy way of interfacing with rest of the driver instead of refactoring
>>>> to fit (idling, breadcrumbs, scheduler, tasklets, ...).
>>>>
>>>
>>> Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
>>> away once the DRM scheduler lands. No need rework those as we are just
>>> going to rework this again.
>>
>> Well today I read the breadcrumbs patch and there is no way that's clean. It
>> goes and creates one object per engine, then deletes them, replacing with
>> GuC special one. All in the same engine setup. The same pattern of bolting
>> on the GuC repeats too much for my taste.
>>
> 
> I don't think creating a default object /w a ref count then decrementing
> the ref count + replacing it with a new object is that hard to
> understand. IMO that is way better than how things worked previously

It's not about it being hard to understand, although it certainly is far 
from the usual patterns, but about it being lazy design which in normal 
times would never be allowed. Because reduced and flattened to highlight 
the principal complaint it looks like this:

engine_setup_for_each_engine:
    engine->breadcrumbs = create_breadcrumbs();
    if (guc) {
       if (!first_class_engine) {
         kfree(engine->breadcrumbs);
         engine->breadcrumbs = first_class_engine->breadcrumbs;
       } else {
         first_class_engine->breadcrumbs->vfuncs = guc_vfuncs;
       }
    }

What I suggested is that patch should not break and hack the object 
oriented design and instead could do along the lines:

gt_guc_setup:
    for_each_class:
       gt->uc.breadcrumbs[class] = create_guc_breadcrumbs();

engine_setup_for_each_engine:
    if (guc)
       engine->breadcrumbs = get(gt->uc.breadcrumbs[class]);
    else
       engine->breadcrumbs = create_breadcrumbs();

> where we just made implicit assumptions all over the driver of the
> execlists backend behavior. If this was done properly in the current
> i915 code base this really wouldn't be an issue.

Don't really follow you here but it probably goes back to how upstream 
code was there available to be refactored all this time.

Regards,

Tvrtko
Tvrtko Ursulin June 3, 2021, 9:49 a.m. UTC | #28
On 03/06/2021 05:47, Daniel Vetter wrote:
> On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost <matthew.brost@intel.com> wrote:
>>
>> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
>>> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>> On 25/05/2021 17:45, Matthew Brost wrote:
>>>>> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>>>>>    * Context pinning code with it's magical two adds, subtract and cmpxchg is
>>>>>> dodgy as well.
>>>>>
>>>>> Daniele tried to remove this and it proved quite difficult + created
>>>>> even more races in the backend code. This was prior to the pre-pin and
>>>>> post-unpin code which makes this even more difficult to fix as I believe
>>>>> these functions would need to be removed first. Not saying we can't
>>>>> revisit this someday but I personally really like it - it is a clever
>>>>> way to avoid reentering the pin / unpin code while asynchronous things
>>>>> are happening rather than some complex locking scheme. Lastly, this code
>>>>> has proved incredibly stable as I don't think we've had to fix a single
>>>>> thing in this area since we've been using this code internally.
>>>>
>>>> Pretty much same as above. The code like:
>>>>
>>>> static inline void __intel_context_unpin(struct intel_context *ce)
>>>> {
>>>>          if (!ce->ops->sched_disable) {
>>>>                  __intel_context_do_unpin(ce, 1);
>>>>          } else {
>>>>                  while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>>>>                          if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>>>>                                  ce->ops->sched_disable(ce);
>>>>                                  break;
>>>>                          }
>>>>                  }
>>>>          }
>>>> }
>>>>
>>>> That's pretty much impenetrable for me and the only thing I can think of
>>>> here is **ALARM** must be broken! See what others think..
>>
>> Yea, probably should add a comment:
>>
>> /*
>>   * If the context has the sched_disable function, it isn't safe to unpin
>>   * until this function completes. This function is allowed to complete
>>   * asynchronously too. To avoid this function from being entered twice
>>   * and move ownership of the unpin to this function's completion, adjust
>>   * the pin count to 2 before it is entered. When this function completes
>>   * the context can call intel_context_sched_unpin which decrements the
>>   * pin count by 2 potentially resulting in an unpin.
>>   *
>>   * A while loop is needed to ensure the atomicity of the pin count. e.g.
>>   * The below if / else statement has a race:
>>   *
>>   * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>>   *      ce->ops->sched_disable(ce);
>>   * else
>>   *      atomic_dec(ce, 1);
>>   *
>>   * Two threads could simultaneously fail the if clause resulting in the
>>   * pin_count going to 0 with scheduling enabled + the context pinned.
>>   */
>>
>>>
>>> pin_count is a hand-rolled mutex, except not actually a real one, and
>>> it's absolutely hiliarous in it's various incarnations (there's one
>>> each on i915_vm, vma, obj and probably a few more).
>>>
>>> Not the worst one I've seen by far in the code we've merged already.
>>> Minimally this needs a comment here and in the struct next to
>>> @pin_count to explain where all this is abused, which would already
>>> make it better than most of the in-tree ones.
>>>
>>> As part of the ttm conversion we have a plan to sunset the "pin_count
>>> as a lock" stuff, depending how bad that goes we might need to split
>>> up the task for each struct that has such a pin_count.
>>>
>>
>> Didn't know that with the TTM rework this value might go away. If that
>> is truely the direction I don't see the point in reworking this now. It
>> 100% works and with a comment I think it can be understood what it is
>> doing.
> 
> Well not go away, but things will change. Currently the various
> ->pin_count sprinkled all over the place have essentially two uses
> - pinning stuff long term (scanout, ctxs, anything that stays pinned
> after the ioctl is done essentially)
> - short-term lock-like construct
> 
> There's going to be two changes:
> - The short-term pins will be replaced by dma_resv_lock/unlock pairs
> - the locking rules for long-term pins will change, because we'll
> require that you must hold dma_resv_lock for unpinning. So no more
> atomic_t, also no more races for final unpins vs cleanup work
> 
> Also now that you've explained the why for this dance, especially the
> async part: Since the new unpin will hold dma_resv_lock, we can
> create&attach dma_fence for tracking async completion which then the
> next operation can wait on.

Yes, async would be an improvement in principle, because...

> The awkward state we have right now is that there's a lot of places
> where we require the unpin to be done locklessly with these atomic
> tricks, so there's going to be quite some surgery involved all over
> the code.

... I think the main problem with how impenetrable, both this and the 
guc context state machine in general are, stem from the fact that the 
design is not right.

For instance we have intel_context which is one thing to i915, but with 
the GuC adaptations the guc state machine handling has been shoved 
inside it. That makes it complex and destroys the separation of duties.

Instead intel_context should remain the common layer and handling of GuC 
firmware needs should go in a layer under it. Whether it would subclass, 
or use a different pattern I can't tell right now. But if it was clearly 
separated then the state machine handling would have it's place away 
from the common code.

In 2019 I did push to at least prefix the guc specific fields with guc, 
as minimum, but I don't think they, and accompanying code, really should 
be present in backend agnostic intel_context.

Regards,

Tvrtko
Tvrtko Ursulin June 3, 2021, 10:52 a.m. UTC | #29
On 03/06/2021 04:41, Matthew Brost wrote:
> On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
>> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> On 25/05/2021 17:45, Matthew Brost wrote:
>>>> On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
>>>>>    * Context pinning code with it's magical two adds, subtract and cmpxchg is
>>>>> dodgy as well.
>>>>
>>>> Daniele tried to remove this and it proved quite difficult + created
>>>> even more races in the backend code. This was prior to the pre-pin and
>>>> post-unpin code which makes this even more difficult to fix as I believe
>>>> these functions would need to be removed first. Not saying we can't
>>>> revisit this someday but I personally really like it - it is a clever
>>>> way to avoid reentering the pin / unpin code while asynchronous things
>>>> are happening rather than some complex locking scheme. Lastly, this code
>>>> has proved incredibly stable as I don't think we've had to fix a single
>>>> thing in this area since we've been using this code internally.
>>>
>>> Pretty much same as above. The code like:
>>>
>>> static inline void __intel_context_unpin(struct intel_context *ce)
>>> {
>>>          if (!ce->ops->sched_disable) {
>>>                  __intel_context_do_unpin(ce, 1);
>>>          } else {
>>>                  while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>>>                          if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>>>                                  ce->ops->sched_disable(ce);
>>>                                  break;
>>>                          }
>>>                  }
>>>          }
>>> }
>>>
>>> That's pretty much impenetrable for me and the only thing I can think of
>>> here is **ALARM** must be broken! See what others think..
> 
> Yea, probably should add a comment:
> 
> /*
>   * If the context has the sched_disable function, it isn't safe to unpin
>   * until this function completes. This function is allowed to complete
>   * asynchronously too. To avoid this function from being entered twice
>   * and move ownership of the unpin to this function's completion, adjust
>   * the pin count to 2 before it is entered. When this function completes
>   * the context can call intel_context_sched_unpin which decrements the
>   * pin count by 2 potentially resulting in an unpin.
>   *
>   * A while loop is needed to ensure the atomicity of the pin count. e.g.
>   * The below if / else statement has a race:
>   *
>   * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
>   * 	ce->ops->sched_disable(ce);
>   * else
>   * 	atomic_dec(ce, 1);
>   *
>   * Two threads could simultaneously fail the if clause resulting in the
>   * pin_count going to 0 with scheduling enabled + the context pinned.
>   */

I have many questions here..

How time bound is the busy loop?

In guc_context_sched_disable the case where someone pins after the magic 
2 has been set is handled.

But what is pin_count got to 2 legitimately, via the unpin and pin 
between the atomic_cmpxchg in __intel_context_unpin and relevant lines 
in guc_context_sched_disable get to execute?

Why is the pin_count dec in guc_context_sched_disable under the 
ce->guc_state.lock?

What is the point of:

	enabled = context_enabled(ce);
	if (unlikely(!enabled || submission_disabled(guc))) {
		if (!enabled)
			clr_context_enabled(ce);

Reads like clearing the enabled bit if it is not set?!

Why is:

static inline void clr_context_enabled(struct intel_context *ce)
{
	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
		   &ce->guc_sched_state_no_lock);
}

Operating on a field called "guc_sched_state_no_lock" (no lock!) while 
the caller is holding guc_state.lock while manipulating that lock.

Regards,

Tvrtko
Matthew Brost June 3, 2021, 4:34 p.m. UTC | #30
On Thu, Jun 03, 2021 at 09:51:19AM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/2021 05:10, Matthew Brost wrote:
> > On Wed, Jun 02, 2021 at 04:27:18PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 25/05/2021 17:45, Matthew Brost wrote:
> 
> [snip]
> 
> > > > >    * Kludgy way of interfacing with rest of the driver instead of refactoring
> > > > > to fit (idling, breadcrumbs, scheduler, tasklets, ...).
> > > > > 
> > > > 
> > > > Idling and breadcrumbs seem clean to me. Scheduler + tasklet are going
> > > > away once the DRM scheduler lands. No need rework those as we are just
> > > > going to rework this again.
> > > 
> > > Well today I read the breadcrumbs patch and there is no way that's clean. It
> > > goes and creates one object per engine, then deletes them, replacing with
> > > GuC special one. All in the same engine setup. The same pattern of bolting
> > > on the GuC repeats too much for my taste.
> > > 
> > 
> > I don't think creating a default object /w a ref count then decrementing
> > the ref count + replacing it with a new object is that hard to
> > understand. IMO that is way better than how things worked previously
> 
> It's not about it being hard to understand, although it certainly is far
> from the usual patterns, but about it being lazy design which in normal
> times would never be allowed. Because reduced and flattened to highlight the
> principal complaint it looks like this:
> 
> engine_setup_for_each_engine:
>    engine->breadcrumbs = create_breadcrumbs();
>    if (guc) {
>       if (!first_class_engine) {
>         kfree(engine->breadcrumbs);
>         engine->breadcrumbs = first_class_engine->breadcrumbs;
>       } else {
>         first_class_engine->breadcrumbs->vfuncs = guc_vfuncs;
>       }
>    }
> 

I think are diving way to deep into individual patches on the cover
letter.

Agree this could be refactored bit more. Let me try a rework on this
patch in particular before this patch gets posted again.

Matt 

> What I suggested is that patch should not break and hack the object oriented
> design and instead could do along the lines:
> 
> gt_guc_setup:
>    for_each_class:
>       gt->uc.breadcrumbs[class] = create_guc_breadcrumbs();
> 
> engine_setup_for_each_engine:
>    if (guc)
>       engine->breadcrumbs = get(gt->uc.breadcrumbs[class]);
>    else
>       engine->breadcrumbs = create_breadcrumbs();
> 
> > where we just made implicit assumptions all over the driver of the
> > execlists backend behavior. If this was done properly in the current
> > i915 code base this really wouldn't be an issue.
> 
> Don't really follow you here but it probably goes back to how upstream code
> was there available to be refactored all this time.
> 
> Regards,
> 
> Tvrtko