mbox series

[0/7] Commit early to GuC

Message ID 20200115013143.34961-1-daniele.ceraolospurio@intel.com (mailing list archive)
Headers show
Series Commit early to GuC | expand

Message

Daniele Ceraolo Spurio Jan. 15, 2020, 1:31 a.m. UTC
We currently wait until we attempt to load the GuC to confirm if we're
in GuC mode or not, at which point a lot of the engine setup has already
happened and needs to be updated for GuC submission. To allow us to get
the setup done directly into GuC mode, we need to commit to using GuC
as soon as possible. Currently, if GuC is enabled via modparam on a
platform that supports it, the main issue that can cause us to fall-back
to non-GuC mode is the lack of blobs on the system. It is not safe to
fall back to non-GuC after attempting to load the blobs (see
__uc_check_hw) and all the functions in the GuC paths between the fetch
and the load can only fail if something is fundamentally wrong with the
system (e.g. allocation failure). Therefore, committing to using the GuC
after the fetch is successful seems like a reasonable compromise between
early setup and fall-back options.
To better track this, this series splits the uC init status in 3 steps,
with the last one meaning we're locked in and can't fall back anymore:

- supported: HW supports the microcontroller
- wanted: supported and selected in modparam
- used: wanted and blob found on the system

(Suggestions for better naming are welcome)

The last patch in the series starts using the early commitment to setup
the GuC submission back-end instead of the execlists one, instead of
always setting up the latter and then taking over. This is just an
higher level change for now as the GuC code calls the execlists one
internally, but I plan to follow with more changes while we switch to
the new interface.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>

Anusha Srivatsa (1):
  HAX: force enable_guc=2

Daniele Ceraolo Spurio (6):
  drm/i915/guc: Kill USES_GUC macro
  drm/i915/guc: Kill USES_GUC_SUBMISSION macro
  drm/i915/uc: Improve tracking of uC init status
  drm/i915/uc: Abort early on uc_init failure
  drm/i915/guc: Apply new uC status tracking to GuC submission as well
  drm/i915/guc: Start considering GuC submission a proper back-end

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 10 ++-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |  4 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 42 +++++++------
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  2 +
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +--
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 14 ++---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 25 ++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_huc.c        |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  8 ++-
 drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 59 +++++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         | 61 +++++++++++--------
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c          |  3 +-
 drivers/gpu/drm/i915/i915_debugfs.c           | 25 ++++----
 drivers/gpu/drm/i915/i915_drv.h               | 10 ---
 drivers/gpu/drm/i915/i915_params.h            |  2 +-
 drivers/gpu/drm/i915/intel_gvt.c              |  2 +-
 24 files changed, 212 insertions(+), 139 deletions(-)

Comments

Chris Wilson Jan. 15, 2020, 8:40 a.m. UTC | #1
Quoting Daniele Ceraolo Spurio (2020-01-15 01:31:36)
> We currently wait until we attempt to load the GuC to confirm if we're
> in GuC mode or not, at which point a lot of the engine setup has already
> happened and needs to be updated for GuC submission. To allow us to get
> the setup done directly into GuC mode, we need to commit to using GuC
> as soon as possible.

I think this is the wrong direction; as I thought the goal was to allow
delayed loading of firmware, even going as far as allowing the system to
run a browser for the user to get the firmware first. I may be
completely wrong about that, but imho I never want to have to build
firmware images into the kernel.

The transition from execlists to guc could be just set-wedged; delete
old engines, build guc engines. [This should also work for guc -> guc.]
Throwing away context state is ugly, but simple enough.
-Chris
Daniele Ceraolo Spurio Jan. 15, 2020, 3:57 p.m. UTC | #2
On 1/15/2020 12:40 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2020-01-15 01:31:36)
>> We currently wait until we attempt to load the GuC to confirm if we're
>> in GuC mode or not, at which point a lot of the engine setup has already
>> happened and needs to be updated for GuC submission. To allow us to get
>> the setup done directly into GuC mode, we need to commit to using GuC
>> as soon as possible.
> I think this is the wrong direction; as I thought the goal was to allow
> delayed loading of firmware, even going as far as allowing the system to
> run a browser for the user to get the firmware first. I may be

We do indeed want to keep supporting execlists mode even as some HW 
features move to the GuC to allow the user to get to the binaries, but 
we don't want to switch between the 2 modes without a reboot. Switching 
between the 2 modes is not a HW capability that we're committed to; the 
guc->elsp transition is already not possible, while the elsp->guc one 
still seems to work, but who knows for how long it will?

This series is also not really changing the commitment at the 
implementation level, just making it "official" and acting based on 
that. Even without these patches, if the blobs are on the system we will 
attempt to get into GuC mode unless we get an allocation failure or 
something similar, in which case it is extremely likely that the 
fall-back to non-guc will not work either.

> completely wrong about that, but imho I never want to have to build
> firmware images into the kernel.

I do 100% agree with this statement, although I'm not sure how this 
relates to the series. Are you planning to pull some of the engine setup 
to an even earlier point?

>
> The transition from execlists to guc could be just set-wedged; delete
> old engines, build guc engines. [This should also work for guc -> guc.]
> Throwing away context state is ugly, but simple enough.

As mentioned above, we can't switch between elsp and GuC modes so this 
transition would have to be done before the first submission to HW. Why 
not go directly in GuC mode then?

Daniele

> -Chris
Chris Wilson Jan. 15, 2020, 4:18 p.m. UTC | #3
Quoting Daniele Ceraolo Spurio (2020-01-15 15:57:27)
> 
> 
> On 1/15/2020 12:40 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2020-01-15 01:31:36)
> >> We currently wait until we attempt to load the GuC to confirm if we're
> >> in GuC mode or not, at which point a lot of the engine setup has already
> >> happened and needs to be updated for GuC submission. To allow us to get
> >> the setup done directly into GuC mode, we need to commit to using GuC
> >> as soon as possible.
> > I think this is the wrong direction; as I thought the goal was to allow
> > delayed loading of firmware, even going as far as allowing the system to
> > run a browser for the user to get the firmware first. I may be
> 
> We do indeed want to keep supporting execlists mode even as some HW 
> features move to the GuC to allow the user to get to the binaries, but 
> we don't want to switch between the 2 modes without a reboot. Switching 
> between the 2 modes is not a HW capability that we're committed to; the 
> guc->elsp transition is already not possible, while the elsp->guc one 
> still seems to work, but who knows for how long it will?
> 
> This series is also not really changing the commitment at the 
> implementation level, just making it "official" and acting based on 
> that. Even without these patches, if the blobs are on the system we will 
> attempt to get into GuC mode unless we get an allocation failure or 
> something similar, in which case it is extremely likely that the 
> fall-back to non-guc will not work either.
> 
> > completely wrong about that, but imho I never want to have to build
> > firmware images into the kernel.
> 
> I do 100% agree with this statement, although I'm not sure how this 
> relates to the series. Are you planning to pull some of the engine setup 
> to an even earlier point?
> 
> >
> > The transition from execlists to guc could be just set-wedged; delete
> > old engines, build guc engines. [This should also work for guc -> guc.]
> > Throwing away context state is ugly, but simple enough.
> 
> As mentioned above, we can't switch between elsp and GuC modes so this 
> transition would have to be done before the first submission to HW. Why 
> not go directly in GuC mode then?

So the problem is if we can't freely switch (we can never power down the
guc, that seems unlikely?) then we can't make a decision on which mode
to run (and which engines to initialise) until userspace is active and
has committed to supplying or not supplying a fw image. Which puts us in
a catch-22 of wanting to register the driver with userspace before we
have finalized initialisation.

If the transition is impossible, it seems like you have no choice but to
require the fw image at initialisation. I do not understand why it has
to be that way, seems such a hindrance.
-Chris
Daniele Ceraolo Spurio Jan. 15, 2020, 8:46 p.m. UTC | #4
On 1/15/20 8:18 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2020-01-15 15:57:27)
>>
>>
>> On 1/15/2020 12:40 AM, Chris Wilson wrote:
>>> Quoting Daniele Ceraolo Spurio (2020-01-15 01:31:36)
>>>> We currently wait until we attempt to load the GuC to confirm if we're
>>>> in GuC mode or not, at which point a lot of the engine setup has already
>>>> happened and needs to be updated for GuC submission. To allow us to get
>>>> the setup done directly into GuC mode, we need to commit to using GuC
>>>> as soon as possible.
>>> I think this is the wrong direction; as I thought the goal was to allow
>>> delayed loading of firmware, even going as far as allowing the system to
>>> run a browser for the user to get the firmware first. I may be
>>
>> We do indeed want to keep supporting execlists mode even as some HW
>> features move to the GuC to allow the user to get to the binaries, but
>> we don't want to switch between the 2 modes without a reboot. Switching
>> between the 2 modes is not a HW capability that we're committed to; the
>> guc->elsp transition is already not possible, while the elsp->guc one
>> still seems to work, but who knows for how long it will?
>>
>> This series is also not really changing the commitment at the
>> implementation level, just making it "official" and acting based on
>> that. Even without these patches, if the blobs are on the system we will
>> attempt to get into GuC mode unless we get an allocation failure or
>> something similar, in which case it is extremely likely that the
>> fall-back to non-guc will not work either.
>>
>>> completely wrong about that, but imho I never want to have to build
>>> firmware images into the kernel.
>>
>> I do 100% agree with this statement, although I'm not sure how this
>> relates to the series. Are you planning to pull some of the engine setup
>> to an even earlier point?
>>
>>>
>>> The transition from execlists to guc could be just set-wedged; delete
>>> old engines, build guc engines. [This should also work for guc -> guc.]
>>> Throwing away context state is ugly, but simple enough.
>>
>> As mentioned above, we can't switch between elsp and GuC modes so this
>> transition would have to be done before the first submission to HW. Why
>> not go directly in GuC mode then?
> 
> So the problem is if we can't freely switch (we can never power down the
> guc, that seems unlikely?) then we can't make a decision on which mode

AFAIU it's not the GuC itself that's the issue, but some of the other 
units, some of which outside the GT, that can get locked in one mode 
without being resettable (like what happens for the WOPCM regs for example).

> to run (and which engines to initialise) until userspace is active and
> has committed to supplying or not supplying a fw image. Which puts us in
> a catch-22 of wanting to register the driver with userspace before we
> have finalized initialisation.

I think this is the bit I'm missing. Why do you want userspace to 
directly provide the firmware instead of fetching it from /lib/firmware 
like we do now? The distros should pack the correct firmware in their 
linux-firmware packages and it seems reasonable to me to expect the 
system to be rebooted after fetching a binary by hand. We can move the 
fetch to an earlier point in time if we need the info earlier, since it 
does not require the HW to be ready.

> 
> If the transition is impossible, it seems like you have no choice but to
> require the fw image at initialisation. I do not understand why it has
> to be that way, seems such a hindrance.

My understanding from the talk I had with the HW team when we realized 
that the guc->elsp transition was broken on gen11 is that the HW expects 
SW to pick a mode and stick to that. The elsp->guc transition seem to 
still work, but there is no guarantee it will keep doing so in the 
future and therefore it doesn't seem like a good idea to build on that.

Daniele

> -Chris
>