diff mbox series

[v3,05/16] drm/i915: Disable the "binder"

Message ID 20240116075636.6121-6-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: (stolen) memory region related fixes | expand

Commit Message

Ville Syrjälä Jan. 16, 2024, 7:56 a.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Now that the GGTT PTE updates go straight to GSMBASE (bypassing
GTTMMADR) there should be no more risk of system hangs? So the
"binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
necessary, disable it.

My main worry with the MI_UPDATE_GTT are:
- only used on this one platform so very limited testing coverage
- async so more opprtunities to screw things up
- what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
  to finish?
- requires working command submission, so even getting a working
  display now depends on a lot more extra components working correctly

TODO: MI_UPDATE_GTT might be interesting as an optimization
though, so perhaps someone should look into always using it
(assuming the GPU is alive and well)?

v2: Keep using MI_UPDATE_GTT on VM guests

Cc: Paz Zcharya <pazz@chromium.org>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Nirmoy Das Jan. 16, 2024, 10:32 a.m. UTC | #1
On 1/16/2024 8:56 AM, Ville Syrjala wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> GTTMMADR) there should be no more risk of system hangs? So the
> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> necessary, disable it.
>
> My main worry with the MI_UPDATE_GTT are:
> - only used on this one platform so very limited testing coverage
> - async so more opprtunities to screw things up
> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
>    to finish?
> - requires working command submission, so even getting a working
>    display now depends on a lot more extra components working correctly
>
> TODO: MI_UPDATE_GTT might be interesting as an optimization
> though, so perhaps someone should look into always using it
> (assuming the GPU is alive and well)?
>
> v2: Keep using MI_UPDATE_GTT on VM guests
>
> Cc: Paz Zcharya <pazz@chromium.org>
> Cc: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>

> ---
>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 86f73fe558ca..e83dabc56a14 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -24,7 +24,8 @@
>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
>   {
>   	/* Wa_13010847436 & Wa_14019519902 */
> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> +	return i915_run_as_guest() &&
> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>   }
>   
>   static bool intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *i915)
Michał Winiarski Jan. 17, 2024, 2:13 p.m. UTC | #2
On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> GTTMMADR) there should be no more risk of system hangs? So the
> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> necessary, disable it.
> 
> My main worry with the MI_UPDATE_GTT are:
> - only used on this one platform so very limited testing coverage
> - async so more opprtunities to screw things up
> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
>   to finish?
> - requires working command submission, so even getting a working
>   display now depends on a lot more extra components working correctly
> 
> TODO: MI_UPDATE_GTT might be interesting as an optimization
> though, so perhaps someone should look into always using it
> (assuming the GPU is alive and well)?
> 
> v2: Keep using MI_UPDATE_GTT on VM guests
> 
> Cc: Paz Zcharya <pazz@chromium.org>
> Cc: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 86f73fe558ca..e83dabc56a14 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -24,7 +24,8 @@
>  bool i915_ggtt_require_binder(struct drm_i915_private *i915)
>  {
>  	/* Wa_13010847436 & Wa_14019519902 */
> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> +	return i915_run_as_guest() &&
> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);

Note that i915_run_as_guest() is not the most reliable way to decide
whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
If it's not set - the driver will go into GSMBASE, which is not mapped
inside the guest.
Does the system firmware advertise whether GSMBASE is "open" or "closed"
to CPU access in any way?

-Michał

>  }
>  
>  static bool intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *i915)
> -- 
> 2.41.0
>
Das, Nirmoy Jan. 17, 2024, 5:46 p.m. UTC | #3
On 1/17/2024 3:13 PM, Michał Winiarski wrote:
> On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>
>> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
>> GTTMMADR) there should be no more risk of system hangs? So the
>> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
>> necessary, disable it.
>>
>> My main worry with the MI_UPDATE_GTT are:
>> - only used on this one platform so very limited testing coverage
>> - async so more opprtunities to screw things up
>> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
>>    to finish?
>> - requires working command submission, so even getting a working
>>    display now depends on a lot more extra components working correctly
>>
>> TODO: MI_UPDATE_GTT might be interesting as an optimization
>> though, so perhaps someone should look into always using it
>> (assuming the GPU is alive and well)?
>>
>> v2: Keep using MI_UPDATE_GTT on VM guests
>>
>> Cc: Paz Zcharya <pazz@chromium.org>
>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 86f73fe558ca..e83dabc56a14 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -24,7 +24,8 @@
>>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
>>   {
>>   	/* Wa_13010847436 & Wa_14019519902 */
>> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>> +	return i915_run_as_guest() &&
>> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> Note that i915_run_as_guest() is not the most reliable way to decide
> whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
> hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
> If it's not set - the driver will go into GSMBASE, which is not mapped
> inside the guest.
> Does the system firmware advertise whether GSMBASE is "open" or "closed"
> to CPU access in any way?

Had a chat with David from IVE team, David suggested to read 0x138914 to 
determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset 
0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074



Regards,

Nirmoy

>
> -Michał
>
>>   }
>>   
>>   static bool intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *i915)
>> -- 
>> 2.41.0
>>
Ville Syrjälä Jan. 18, 2024, 11:12 p.m. UTC | #4
On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
> 
> On 1/17/2024 3:13 PM, Michał Winiarski wrote:
> > On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
> >> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >>
> >> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> >> GTTMMADR) there should be no more risk of system hangs? So the
> >> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> >> necessary, disable it.
> >>
> >> My main worry with the MI_UPDATE_GTT are:
> >> - only used on this one platform so very limited testing coverage
> >> - async so more opprtunities to screw things up
> >> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> >>    to finish?
> >> - requires working command submission, so even getting a working
> >>    display now depends on a lot more extra components working correctly
> >>
> >> TODO: MI_UPDATE_GTT might be interesting as an optimization
> >> though, so perhaps someone should look into always using it
> >> (assuming the GPU is alive and well)?
> >>
> >> v2: Keep using MI_UPDATE_GTT on VM guests
> >>
> >> Cc: Paz Zcharya <pazz@chromium.org>
> >> Cc: Nirmoy Das <nirmoy.das@intel.com>
> >> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> >> index 86f73fe558ca..e83dabc56a14 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> >> @@ -24,7 +24,8 @@
> >>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
> >>   {
> >>   	/* Wa_13010847436 & Wa_14019519902 */
> >> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> >> +	return i915_run_as_guest() &&
> >> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > Note that i915_run_as_guest() is not the most reliable way to decide
> > whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
> > hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
> > If it's not set - the driver will go into GSMBASE, which is not mapped
> > inside the guest.
> > Does the system firmware advertise whether GSMBASE is "open" or "closed"
> > to CPU access in any way?
> 
> Had a chat with David from IVE team, David suggested to read 0x138914 to 
> determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset 
> 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074

OK, so we can confirm the firmware is on board. I suppose no real harm
in doing so even though it would clearly be a rather weird if someone
would ship some ancient firmware that doesn't handle this.

But that still won't help with the guest side handling because that
register will read the same in the guest.
Nirmoy Das Jan. 19, 2024, 10:47 a.m. UTC | #5
On 1/19/2024 12:12 AM, Ville Syrjälä wrote:
> On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
>> On 1/17/2024 3:13 PM, Michał Winiarski wrote:
>>> On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
>>>> From: Ville Syrjälä<ville.syrjala@linux.intel.com>
>>>>
>>>> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
>>>> GTTMMADR) there should be no more risk of system hangs? So the
>>>> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
>>>> necessary, disable it.
>>>>
>>>> My main worry with the MI_UPDATE_GTT are:
>>>> - only used on this one platform so very limited testing coverage
>>>> - async so more opprtunities to screw things up
>>>> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
>>>>     to finish?
>>>> - requires working command submission, so even getting a working
>>>>     display now depends on a lot more extra components working correctly
>>>>
>>>> TODO: MI_UPDATE_GTT might be interesting as an optimization
>>>> though, so perhaps someone should look into always using it
>>>> (assuming the GPU is alive and well)?
>>>>
>>>> v2: Keep using MI_UPDATE_GTT on VM guests
>>>>
>>>> Cc: Paz Zcharya<pazz@chromium.org>
>>>> Cc: Nirmoy Das<nirmoy.das@intel.com>
>>>> Signed-off-by: Ville Syrjälä<ville.syrjala@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> index 86f73fe558ca..e83dabc56a14 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> @@ -24,7 +24,8 @@
>>>>    bool i915_ggtt_require_binder(struct drm_i915_private *i915)
>>>>    {
>>>>    	/* Wa_13010847436 & Wa_14019519902 */
>>>> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>>>> +	return i915_run_as_guest() &&
>>>> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>>> Note that i915_run_as_guest() is not the most reliable way to decide
>>> whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
>>> hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
>>> If it's not set - the driver will go into GSMBASE, which is not mapped
>>> inside the guest.
>>> Does the system firmware advertise whether GSMBASE is "open" or "closed"
>>> to CPU access in any way?
>> Had a chat with David from IVE team, David suggested to read 0x138914 to
>> determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset
>> 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074
> OK, so we can confirm the firmware is on board. I suppose no real harm
> in doing so even though it would clearly be a rather weird if someone
> would ship some ancient firmware that doesn't handle this.
>
> But that still won't help with the guest side handling because that
> register will read the same in the guest.


We are back to the same question :/ How about
if (boot_cpu_has(X86_FEATURE_HYPERVISOR) && !i915_run_as_guest()

disable binder

Regards,

Nirmoy

>
Das, Nirmoy Jan. 19, 2024, 10:49 a.m. UTC | #6
On 1/19/2024 11:47 AM, Nirmoy Das wrote:
>
>
> On 1/19/2024 12:12 AM, Ville Syrjälä wrote:
>> On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
>>> On 1/17/2024 3:13 PM, Michał Winiarski wrote:
>>>> On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
>>>>> From: Ville Syrjälä<ville.syrjala@linux.intel.com>
>>>>>
>>>>> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
>>>>> GTTMMADR) there should be no more risk of system hangs? So the
>>>>> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
>>>>> necessary, disable it.
>>>>>
>>>>> My main worry with the MI_UPDATE_GTT are:
>>>>> - only used on this one platform so very limited testing coverage
>>>>> - async so more opprtunities to screw things up
>>>>> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
>>>>>     to finish?
>>>>> - requires working command submission, so even getting a working
>>>>>     display now depends on a lot more extra components working correctly
>>>>>
>>>>> TODO: MI_UPDATE_GTT might be interesting as an optimization
>>>>> though, so perhaps someone should look into always using it
>>>>> (assuming the GPU is alive and well)?
>>>>>
>>>>> v2: Keep using MI_UPDATE_GTT on VM guests
>>>>>
>>>>> Cc: Paz Zcharya<pazz@chromium.org>
>>>>> Cc: Nirmoy Das<nirmoy.das@intel.com>
>>>>> Signed-off-by: Ville Syrjälä<ville.syrjala@linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
>>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> index 86f73fe558ca..e83dabc56a14 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> @@ -24,7 +24,8 @@
>>>>>    bool i915_ggtt_require_binder(struct drm_i915_private *i915)
>>>>>    {
>>>>>    	/* Wa_13010847436 & Wa_14019519902 */
>>>>> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>>>>> +	return i915_run_as_guest() &&
>>>>> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
>>>> Note that i915_run_as_guest() is not the most reliable way to decide
>>>> whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
>>>> hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
>>>> If it's not set - the driver will go into GSMBASE, which is not mapped
>>>> inside the guest.
>>>> Does the system firmware advertise whether GSMBASE is "open" or "closed"
>>>> to CPU access in any way?
>>> Had a chat with David from IVE team, David suggested to read 0x138914 to
>>> determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset
>>> 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074
>> OK, so we can confirm the firmware is on board. I suppose no real harm
>> in doing so even though it would clearly be a rather weird if someone
>> would ship some ancient firmware that doesn't handle this.
>>
>> But that still won't help with the guest side handling because that
>> register will read the same in the guest.
>
>
> We are back to the same question :/ How about
> if (boot_cpu_has(X86_FEATURE_HYPERVISOR) && !i915_run_as_guest()
>
hmm, never mind that was stupid.


> disable binder
>
> Regards,
>
> Nirmoy
>
Ville Syrjälä Jan. 25, 2024, 9:08 a.m. UTC | #7
On Fri, Jan 19, 2024 at 01:12:11AM +0200, Ville Syrjälä wrote:
> On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
> > 
> > On 1/17/2024 3:13 PM, Michał Winiarski wrote:
> > > On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
> > >> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >>
> > >> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> > >> GTTMMADR) there should be no more risk of system hangs? So the
> > >> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> > >> necessary, disable it.
> > >>
> > >> My main worry with the MI_UPDATE_GTT are:
> > >> - only used on this one platform so very limited testing coverage
> > >> - async so more opprtunities to screw things up
> > >> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> > >>    to finish?
> > >> - requires working command submission, so even getting a working
> > >>    display now depends on a lot more extra components working correctly
> > >>
> > >> TODO: MI_UPDATE_GTT might be interesting as an optimization
> > >> though, so perhaps someone should look into always using it
> > >> (assuming the GPU is alive and well)?
> > >>
> > >> v2: Keep using MI_UPDATE_GTT on VM guests
> > >>
> > >> Cc: Paz Zcharya <pazz@chromium.org>
> > >> Cc: Nirmoy Das <nirmoy.das@intel.com>
> > >> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >> ---
> > >>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
> > >>   1 file changed, 2 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > >> index 86f73fe558ca..e83dabc56a14 100644
> > >> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > >> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > >> @@ -24,7 +24,8 @@
> > >>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
> > >>   {
> > >>   	/* Wa_13010847436 & Wa_14019519902 */
> > >> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > >> +	return i915_run_as_guest() &&
> > >> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > > Note that i915_run_as_guest() is not the most reliable way to decide
> > > whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
> > > hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
> > > If it's not set - the driver will go into GSMBASE, which is not mapped
> > > inside the guest.
> > > Does the system firmware advertise whether GSMBASE is "open" or "closed"
> > > to CPU access in any way?
> > 
> > Had a chat with David from IVE team, David suggested to read 0x138914 to 
> > determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset 
> > 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074
> 
> OK, so we can confirm the firmware is on board. I suppose no real harm
> in doing so even though it would clearly be a rather weird if someone
> would ship some ancient firmware that doesn't handle this.
> 
> But that still won't help with the guest side handling because that
> register will read the same in the guest.

I guess we have two options here:
1) ignore non-standard vms that don't advertise themselves
2) try some other heuristics to detect them (eg. host/isa bridge PCI
   IDs/DMI/etc.)

My preference is to just go with option 1, and if someone comes across
a real world use case when the vm is hiding then we can think of some
way to handle it. Trying to come up with heuristics for that without
anything to test against would be 100% guesswork anyway.
Michał Winiarski Jan. 25, 2024, 2:59 p.m. UTC | #8
On Thu, Jan 25, 2024 at 11:08:04AM +0200, Ville Syrjälä wrote:
> On Fri, Jan 19, 2024 at 01:12:11AM +0200, Ville Syrjälä wrote:
> > On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
> > > 
> > > On 1/17/2024 3:13 PM, Michał Winiarski wrote:
> > > > On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
> > > >> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >>
> > > >> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> > > >> GTTMMADR) there should be no more risk of system hangs? So the
> > > >> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> > > >> necessary, disable it.
> > > >>
> > > >> My main worry with the MI_UPDATE_GTT are:
> > > >> - only used on this one platform so very limited testing coverage
> > > >> - async so more opprtunities to screw things up
> > > >> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> > > >>    to finish?
> > > >> - requires working command submission, so even getting a working
> > > >>    display now depends on a lot more extra components working correctly
> > > >>
> > > >> TODO: MI_UPDATE_GTT might be interesting as an optimization
> > > >> though, so perhaps someone should look into always using it
> > > >> (assuming the GPU is alive and well)?
> > > >>
> > > >> v2: Keep using MI_UPDATE_GTT on VM guests
> > > >>
> > > >> Cc: Paz Zcharya <pazz@chromium.org>
> > > >> Cc: Nirmoy Das <nirmoy.das@intel.com>
> > > >> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> ---
> > > >>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
> > > >>   1 file changed, 2 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > >> index 86f73fe558ca..e83dabc56a14 100644
> > > >> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > >> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > >> @@ -24,7 +24,8 @@
> > > >>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
> > > >>   {
> > > >>   	/* Wa_13010847436 & Wa_14019519902 */
> > > >> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > > >> +	return i915_run_as_guest() &&
> > > >> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > > > Note that i915_run_as_guest() is not the most reliable way to decide
> > > > whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
> > > > hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
> > > > If it's not set - the driver will go into GSMBASE, which is not mapped
> > > > inside the guest.
> > > > Does the system firmware advertise whether GSMBASE is "open" or "closed"
> > > > to CPU access in any way?
> > > 
> > > Had a chat with David from IVE team, David suggested to read 0x138914 to 
> > > determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset 
> > > 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074
> > 
> > OK, so we can confirm the firmware is on board. I suppose no real harm
> > in doing so even though it would clearly be a rather weird if someone
> > would ship some ancient firmware that doesn't handle this.
> > 
> > But that still won't help with the guest side handling because that
> > register will read the same in the guest.
> 
> I guess we have two options here:
> 1) ignore non-standard vms that don't advertise themselves
> 2) try some other heuristics to detect them (eg. host/isa bridge PCI
>    IDs/DMI/etc.)
> 
> My preference is to just go with option 1, and if someone comes across
> a real world use case when the vm is hiding then we can think of some
> way to handle it. Trying to come up with heuristics for that without
> anything to test against would be 100% guesswork anyway.
> 
> -- 
> Ville Syrjälä
> Intel

Option 1 can work, but there is a heuristic that should work for most
cases.
If we can assume that on bare-metal, e820 memory map excludes the stolen
region (it's marked as reserved), we should be able to do something that
looks roughly like this (warning - not tested, just a pseudo-code):

static int is_reserved(struct resource *res, void *arg)
{
	return 1;
}

static bool _stolen_is_reserved(u64 addr)
{
	int ret;

	ret = walk_iomem_res_desc(IORES_DESC_RESERVED, IORESOURCE_MEM,
				  gsm, gsm + gsm_size, NULL, is_reserved)
	if (ret != 1)
		return false;

	return true;
}

if (i915_run_as_guest() || !_stolen_is_reserved(gsm, gsm_size))
	fallback_to_mi_ggtt()

Similar sanity check for stolen being reserved should probably also be
done in the regular stolen init path - currently we're creating a
resource named "Graphics Stolen Memory" somewhere in the middle of
System RAM when i915 runs inside VM with native device passthrough.

-Michał
Ville Syrjälä Jan. 31, 2024, 11:33 a.m. UTC | #9
On Thu, Jan 25, 2024 at 03:59:36PM +0100, Michał Winiarski wrote:
> On Thu, Jan 25, 2024 at 11:08:04AM +0200, Ville Syrjälä wrote:
> > On Fri, Jan 19, 2024 at 01:12:11AM +0200, Ville Syrjälä wrote:
> > > On Wed, Jan 17, 2024 at 06:46:24PM +0100, Nirmoy Das wrote:
> > > > 
> > > > On 1/17/2024 3:13 PM, Michał Winiarski wrote:
> > > > > On Tue, Jan 16, 2024 at 09:56:25AM +0200, Ville Syrjala wrote:
> > > > >> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > >>
> > > > >> Now that the GGTT PTE updates go straight to GSMBASE (bypassing
> > > > >> GTTMMADR) there should be no more risk of system hangs? So the
> > > > >> "binder" (ie. update the PTEs via MI_UPDATE_GTT) is no longer
> > > > >> necessary, disable it.
> > > > >>
> > > > >> My main worry with the MI_UPDATE_GTT are:
> > > > >> - only used on this one platform so very limited testing coverage
> > > > >> - async so more opprtunities to screw things up
> > > > >> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> > > > >>    to finish?
> > > > >> - requires working command submission, so even getting a working
> > > > >>    display now depends on a lot more extra components working correctly
> > > > >>
> > > > >> TODO: MI_UPDATE_GTT might be interesting as an optimization
> > > > >> though, so perhaps someone should look into always using it
> > > > >> (assuming the GPU is alive and well)?
> > > > >>
> > > > >> v2: Keep using MI_UPDATE_GTT on VM guests
> > > > >>
> > > > >> Cc: Paz Zcharya <pazz@chromium.org>
> > > > >> Cc: Nirmoy Das <nirmoy.das@intel.com>
> > > > >> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > >> ---
> > > > >>   drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
> > > > >>   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > >> index 86f73fe558ca..e83dabc56a14 100644
> > > > >> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > >> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > >> @@ -24,7 +24,8 @@
> > > > >>   bool i915_ggtt_require_binder(struct drm_i915_private *i915)
> > > > >>   {
> > > > >>   	/* Wa_13010847436 & Wa_14019519902 */
> > > > >> -	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > > > >> +	return i915_run_as_guest() &&
> > > > >> +		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
> > > > > Note that i915_run_as_guest() is not the most reliable way to decide
> > > > > whether to use MI_UPDATE_GTT or straight to GSMBASE, as it requires the
> > > > > hypervisor to "opt-in" and set the X86_FEATURE_HYPERVISOR.
> > > > > If it's not set - the driver will go into GSMBASE, which is not mapped
> > > > > inside the guest.
> > > > > Does the system firmware advertise whether GSMBASE is "open" or "closed"
> > > > > to CPU access in any way?
> > > > 
> > > > Had a chat with David from IVE team, David suggested to read 0x138914 to 
> > > > determine that.  "GOP needs to qualify the WA by reading GFX MMIO offset 
> > > > 0x138914 and verify the value there is 0x1." -> as per the HSD-22018444074
> > > 
> > > OK, so we can confirm the firmware is on board. I suppose no real harm
> > > in doing so even though it would clearly be a rather weird if someone
> > > would ship some ancient firmware that doesn't handle this.
> > > 
> > > But that still won't help with the guest side handling because that
> > > register will read the same in the guest.
> > 
> > I guess we have two options here:
> > 1) ignore non-standard vms that don't advertise themselves
> > 2) try some other heuristics to detect them (eg. host/isa bridge PCI
> >    IDs/DMI/etc.)
> > 
> > My preference is to just go with option 1, and if someone comes across
> > a real world use case when the vm is hiding then we can think of some
> > way to handle it. Trying to come up with heuristics for that without
> > anything to test against would be 100% guesswork anyway.
> > 
> > -- 
> > Ville Syrjälä
> > Intel
> 
> Option 1 can work, but there is a heuristic that should work for most
> cases.
> If we can assume that on bare-metal, e820 memory map excludes the stolen
> region (it's marked as reserved), we should be able to do something that
> looks roughly like this (warning - not tested, just a pseudo-code):
> 
> static int is_reserved(struct resource *res, void *arg)
> {
> 	return 1;
> }
> 
> static bool _stolen_is_reserved(u64 addr)
> {
> 	int ret;
> 
> 	ret = walk_iomem_res_desc(IORES_DESC_RESERVED, IORESOURCE_MEM,
> 				  gsm, gsm + gsm_size, NULL, is_reserved)
> 	if (ret != 1)
> 		return false;
> 
> 	return true;
> }
> 
> if (i915_run_as_guest() || !_stolen_is_reserved(gsm, gsm_size))
> 	fallback_to_mi_ggtt()
> 
> Similar sanity check for stolen being reserved should probably also be
> done in the regular stolen init path - currently we're creating a
> resource named "Graphics Stolen Memory" somewhere in the middle of
> System RAM when i915 runs inside VM with native device passthrough.

You mean request_smem_stolen()? We skip that on LMEMBAR platforms.
And we now rely on the early quirk to figure out the DSM base/size.
People didn't want to keep doing that so now I suppose we're just
relying on the BIOS to do its job right.

So if we wanted to use that we'd need to redesign it to also
work for the LMEMBAR platforms without the early quirk, and
it might also make sense to do something similar for GSM for
extra belts and suspenders.

Anyways, I guess that's a bit beside the point, and just checking
to make sure both DSM and GSM are marked as reserved could be used
to detect that we need to take the normal path instead of the
direct stolen access path. That's assuming VMs don't mark that
range as reserved, which I guess they don't based on what you're
saying?
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 86f73fe558ca..e83dabc56a14 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -24,7 +24,8 @@ 
 bool i915_ggtt_require_binder(struct drm_i915_private *i915)
 {
 	/* Wa_13010847436 & Wa_14019519902 */
-	return MEDIA_VER_FULL(i915) == IP_VER(13, 0);
+	return i915_run_as_guest() &&
+		MEDIA_VER_FULL(i915) == IP_VER(13, 0);
 }
 
 static bool intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *i915)