Message ID | 20180822142927.19565-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix subslice configuration on Gen9LP | expand |
On 22/08/18 15:29, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > According to the documentation, when programming the subslice count power- > gating configuration register, the value to be written into it on Gen9LP > should actually in the format of: > > 1 slice = 0x001 > 2 slices = 0x010 > 3 slices = 0x100 > > And not the popcount of the enabled subslice mask as on other platforms. > > So on Gen9LP platforms we have been programming 0x11 into those bits, but > the documentation does not explain what would that achieve. Could it be > that we enable only two subslice on three sub-slice parts? Or hardware > simply ignores it and sticks with the maximum configuration? > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Bspec: 12247 If it does turn out to be broken and this the correct fix then: Fixes: 0cea6502bf9c ("drm/i915: Request full SSEU enablement on Gen9") ? At that time code looked like: + if (INTEL_INFO(dev)->has_subslice_pg) { + rpcs |= GEN8_RPCS_SS_CNT_ENABLE; + rpcs |= INTEL_INFO(dev)->subslice_per_slice << + GEN8_RPCS_SS_CNT_SHIFT; + rpcs |= GEN8_RPCS_ENABLE; + } So would had already most likely been broken. Tvrtko > --- > Could this actually be true or I am severely misreading the docs? It does > not sound plausible to me this would have been missed all this time.. > > How to test in what configuration do these parts run before and after this > patch? > --- > drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 36050f085071..cdfa962a1975 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) > } > > if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { > + u8 val; > + > + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); > + > + if (IS_GEN9_LP(dev_priv)) > + val = BIT(val - 1); > + > rpcs |= GEN8_RPCS_SS_CNT_ENABLE; > - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) << > - GEN8_RPCS_SS_CNT_SHIFT; > + rpcs |= val << GEN8_RPCS_SS_CNT_SHIFT; > rpcs |= GEN8_RPCS_ENABLE; > } > >
On 22/08/2018 15:29, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > According to the documentation, when programming the subslice count power- > gating configuration register, the value to be written into it on Gen9LP > should actually in the format of: > > 1 slice = 0x001 > 2 slices = 0x010 > 3 slices = 0x100 s/slice/subslice/ Also 0b001 etc... Not hexadecimal. > > And not the popcount of the enabled subslice mask as on other platforms. > > So on Gen9LP platforms we have been programming 0x11 into those bits, but > the documentation does not explain what would that achieve. Could it be > that we enable only two subslice on three sub-slice parts? Or hardware > simply ignores it and sticks with the maximum configuration? > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Bspec: 12247 > --- > Could this actually be true or I am severely misreading the docs? It does > not sound plausible to me this would have been missed all this time.. > > How to test in what configuration do these parts run before and after this > patch? > --- > drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 36050f085071..cdfa962a1975 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) > } > > if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { > + u8 val; > + > + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); > + > + if (IS_GEN9_LP(dev_priv)) > + val = BIT(val - 1); Hmm... Are you breaking the 2 subslices setting here then? (2 subslices = 0b10 which should be equal to hweight8(subslice_mask) if I'm thinking right) > + > rpcs |= GEN8_RPCS_SS_CNT_ENABLE; > - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) << > - GEN8_RPCS_SS_CNT_SHIFT; > + rpcs |= val << GEN8_RPCS_SS_CNT_SHIFT; > rpcs |= GEN8_RPCS_ENABLE; > } >
On 22/08/2018 16:08, Lionel Landwerlin wrote: > On 22/08/2018 15:29, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> According to the documentation, when programming the subslice count >> power- >> gating configuration register, the value to be written into it on Gen9LP >> should actually in the format of: >> >> 1 slice = 0x001 >> 2 slices = 0x010 >> 3 slices = 0x100 > > > s/slice/subslice/ > > > Also 0b001 etc... Not hexadecimal. Oops, you're right. >> >> And not the popcount of the enabled subslice mask as on other platforms. >> >> So on Gen9LP platforms we have been programming 0x11 into those bits, but >> the documentation does not explain what would that achieve. Could it be >> that we enable only two subslice on three sub-slice parts? Or hardware >> simply ignores it and sticks with the maximum configuration? >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Bspec: 12247 >> --- >> Could this actually be true or I am severely misreading the docs? It does >> not sound plausible to me this would have been missed all this time.. >> >> How to test in what configuration do these parts run before and after >> this >> patch? >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index 36050f085071..cdfa962a1975 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >> } >> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >> + u8 val; >> + >> + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >> + >> + if (IS_GEN9_LP(dev_priv)) >> + val = BIT(val - 1); > > > Hmm... Are you breaking the 2 subslices setting here then? > > (2 subslices = 0b10 which should be equal to hweight8(subslice_mask) if > I'm thinking right) No and yes, I think. subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 into the register In the same way, all together: subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 Have I made a mistake somewhere? Regards, Tvrtko > >> + >> rpcs |= GEN8_RPCS_SS_CNT_ENABLE; >> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) << >> - GEN8_RPCS_SS_CNT_SHIFT; >> + rpcs |= val << GEN8_RPCS_SS_CNT_SHIFT; >> rpcs |= GEN8_RPCS_ENABLE; >> } > > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 22/08/2018 16:17, Tvrtko Ursulin wrote: > > On 22/08/2018 16:08, Lionel Landwerlin wrote: >> On 22/08/2018 15:29, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> According to the documentation, when programming the subslice count >>> power- >>> gating configuration register, the value to be written into it on >>> Gen9LP >>> should actually in the format of: >>> >>> 1 slice = 0x001 >>> 2 slices = 0x010 >>> 3 slices = 0x100 >> >> >> s/slice/subslice/ >> >> >> Also 0b001 etc... Not hexadecimal. > > Oops, you're right. > >>> >>> And not the popcount of the enabled subslice mask as on other >>> platforms. >>> >>> So on Gen9LP platforms we have been programming 0x11 into those >>> bits, but >>> the documentation does not explain what would that achieve. Could it be >>> that we enable only two subslice on three sub-slice parts? Or hardware >>> simply ignores it and sticks with the maximum configuration? >>> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> Bspec: 12247 >>> --- >>> Could this actually be true or I am severely misreading the docs? It >>> does >>> not sound plausible to me this would have been missed all this time.. >>> >>> How to test in what configuration do these parts run before and >>> after this >>> patch? >>> --- >>> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>> b/drivers/gpu/drm/i915/intel_lrc.c >>> index 36050f085071..cdfa962a1975 100644 >>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >>> } >>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >>> + u8 val; >>> + >>> + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >>> + >>> + if (IS_GEN9_LP(dev_priv)) >>> + val = BIT(val - 1); >> >> >> Hmm... Are you breaking the 2 subslices setting here then? >> >> (2 subslices = 0b10 which should be equal to hweight8(subslice_mask) >> if I'm thinking right) > > No and yes, I think. > > subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 > into the register > > In the same way, all together: > > subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 > subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 > subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 > > Have I made a mistake somewhere? Ah, yes! You're right :) My eyes got tricked, thanks for finding this out. With the comment fixed : Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > > Regards, > > Tvrtko > >> >>> + >>> rpcs |= GEN8_RPCS_SS_CNT_ENABLE; >>> - rpcs |= >>> hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) << >>> - GEN8_RPCS_SS_CNT_SHIFT; >>> + rpcs |= val << GEN8_RPCS_SS_CNT_SHIFT; >>> rpcs |= GEN8_RPCS_ENABLE; >>> } >> >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
On 22/08/2018 16:22, Lionel Landwerlin wrote: > On 22/08/2018 16:17, Tvrtko Ursulin wrote: >> >> On 22/08/2018 16:08, Lionel Landwerlin wrote: >>> On 22/08/2018 15:29, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> According to the documentation, when programming the subslice count >>>> power- >>>> gating configuration register, the value to be written into it on >>>> Gen9LP >>>> should actually in the format of: >>>> >>>> 1 slice = 0x001 >>>> 2 slices = 0x010 >>>> 3 slices = 0x100 >>> >>> >>> s/slice/subslice/ >>> >>> >>> Also 0b001 etc... Not hexadecimal. >> >> Oops, you're right. >> >>>> >>>> And not the popcount of the enabled subslice mask as on other >>>> platforms. >>>> >>>> So on Gen9LP platforms we have been programming 0x11 into those >>>> bits, but >>>> the documentation does not explain what would that achieve. Could it be >>>> that we enable only two subslice on three sub-slice parts? Or hardware >>>> simply ignores it and sticks with the maximum configuration? >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>> Bspec: 12247 >>>> --- >>>> Could this actually be true or I am severely misreading the docs? It >>>> does >>>> not sound plausible to me this would have been missed all this time.. >>>> >>>> How to test in what configuration do these parts run before and >>>> after this >>>> patch? >>>> --- >>>> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>> index 36050f085071..cdfa962a1975 100644 >>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>> } >>>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >>>> + u8 val; >>>> + >>>> + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >>>> + >>>> + if (IS_GEN9_LP(dev_priv)) >>>> + val = BIT(val - 1); >>> >>> >>> Hmm... Are you breaking the 2 subslices setting here then? >>> >>> (2 subslices = 0b10 which should be equal to hweight8(subslice_mask) >>> if I'm thinking right) >> >> No and yes, I think. >> >> subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 >> into the register >> >> In the same way, all together: >> >> subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 >> subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 >> subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 >> >> Have I made a mistake somewhere? > > > Ah, yes! You're right :) > > My eyes got tricked, thanks for finding this out. At least half of the credit goes to you for linking to 12247 in scope of one different thread! > > With the comment fixed : > > > Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Thanks! Any ideas on how to test this? I'd like to commit message to be more precise - have we been running with one slice too few? Or hardware ignores the undocumented bit combination? Or even, is the documentation perhaps incorrect?! Regards, Tvrtko
On 22/08/2018 16:27, Tvrtko Ursulin wrote: > > On 22/08/2018 16:22, Lionel Landwerlin wrote: >> On 22/08/2018 16:17, Tvrtko Ursulin wrote: >>> >>> On 22/08/2018 16:08, Lionel Landwerlin wrote: >>>> On 22/08/2018 15:29, Tvrtko Ursulin wrote: >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> According to the documentation, when programming the subslice >>>>> count power- >>>>> gating configuration register, the value to be written into it on >>>>> Gen9LP >>>>> should actually in the format of: >>>>> >>>>> 1 slice = 0x001 >>>>> 2 slices = 0x010 >>>>> 3 slices = 0x100 >>>> >>>> >>>> s/slice/subslice/ >>>> >>>> >>>> Also 0b001 etc... Not hexadecimal. >>> >>> Oops, you're right. >>> >>>>> >>>>> And not the popcount of the enabled subslice mask as on other >>>>> platforms. >>>>> >>>>> So on Gen9LP platforms we have been programming 0x11 into those >>>>> bits, but >>>>> the documentation does not explain what would that achieve. Could >>>>> it be >>>>> that we enable only two subslice on three sub-slice parts? Or >>>>> hardware >>>>> simply ignores it and sticks with the maximum configuration? >>>>> >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>> Bspec: 12247 >>>>> --- >>>>> Could this actually be true or I am severely misreading the docs? >>>>> It does >>>>> not sound plausible to me this would have been missed all this time.. >>>>> >>>>> How to test in what configuration do these parts run before and >>>>> after this >>>>> patch? >>>>> --- >>>>> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>> index 36050f085071..cdfa962a1975 100644 >>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>> } >>>>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >>>>> + u8 val; >>>>> + >>>>> + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >>>>> + >>>>> + if (IS_GEN9_LP(dev_priv)) >>>>> + val = BIT(val - 1); >>>> >>>> >>>> Hmm... Are you breaking the 2 subslices setting here then? >>>> >>>> (2 subslices = 0b10 which should be equal to >>>> hweight8(subslice_mask) if I'm thinking right) >>> >>> No and yes, I think. >>> >>> subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 >>> into the register >>> >>> In the same way, all together: >>> >>> subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 >>> subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 >>> subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 >>> >>> Have I made a mistake somewhere? >> >> >> Ah, yes! You're right :) >> >> My eyes got tricked, thanks for finding this out. > > At least half of the credit goes to you for linking to 12247 in scope > of one different thread! > >> >> With the comment fixed : >> >> >> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > > Thanks! > > Any ideas on how to test this? I'd like to commit message to be more > precise - have we been running with one slice too few? Or hardware > ignores the undocumented bit combination? Or even, is the > documentation perhaps incorrect?! Yeah the tests that I wrote to test the API you're picking up use a MI_PREDICATE where you can make a command (let's say MI_STORE_REGISTER_MEM) conditional on the number of slice. You can use that in a user batch an verify that memory has or hasn't been written to. That's only before Gen10 though. I'm looking at whether something that is passed onto the EUs could map back to the physical location. Not sure if there is something... - Lionel > > Regards, > > Tvrtko >
On 22/08/2018 16:47, Lionel Landwerlin wrote: > On 22/08/2018 16:27, Tvrtko Ursulin wrote: >> >> On 22/08/2018 16:22, Lionel Landwerlin wrote: >>> On 22/08/2018 16:17, Tvrtko Ursulin wrote: >>>> >>>> On 22/08/2018 16:08, Lionel Landwerlin wrote: >>>>> On 22/08/2018 15:29, Tvrtko Ursulin wrote: >>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> >>>>>> According to the documentation, when programming the subslice >>>>>> count power- >>>>>> gating configuration register, the value to be written into it on >>>>>> Gen9LP >>>>>> should actually in the format of: >>>>>> >>>>>> 1 slice = 0x001 >>>>>> 2 slices = 0x010 >>>>>> 3 slices = 0x100 >>>>> >>>>> >>>>> s/slice/subslice/ >>>>> >>>>> >>>>> Also 0b001 etc... Not hexadecimal. >>>> >>>> Oops, you're right. >>>> >>>>>> >>>>>> And not the popcount of the enabled subslice mask as on other >>>>>> platforms. >>>>>> >>>>>> So on Gen9LP platforms we have been programming 0x11 into those >>>>>> bits, but >>>>>> the documentation does not explain what would that achieve. Could >>>>>> it be >>>>>> that we enable only two subslice on three sub-slice parts? Or >>>>>> hardware >>>>>> simply ignores it and sticks with the maximum configuration? >>>>>> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>> Bspec: 12247 >>>>>> --- >>>>>> Could this actually be true or I am severely misreading the docs? >>>>>> It does >>>>>> not sound plausible to me this would have been missed all this time.. >>>>>> >>>>>> How to test in what configuration do these parts run before and >>>>>> after this >>>>>> patch? >>>>>> --- >>>>>> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >>>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>>> index 36050f085071..cdfa962a1975 100644 >>>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>>> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>>> } >>>>>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >>>>>> + u8 val; >>>>>> + >>>>>> + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >>>>>> + >>>>>> + if (IS_GEN9_LP(dev_priv)) >>>>>> + val = BIT(val - 1); >>>>> >>>>> >>>>> Hmm... Are you breaking the 2 subslices setting here then? >>>>> >>>>> (2 subslices = 0b10 which should be equal to >>>>> hweight8(subslice_mask) if I'm thinking right) >>>> >>>> No and yes, I think. >>>> >>>> subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 >>>> into the register >>>> >>>> In the same way, all together: >>>> >>>> subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 >>>> subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 >>>> subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 >>>> >>>> Have I made a mistake somewhere? >>> >>> >>> Ah, yes! You're right :) >>> >>> My eyes got tricked, thanks for finding this out. >> >> At least half of the credit goes to you for linking to 12247 in scope >> of one different thread! >> >>> >>> With the comment fixed : >>> >>> >>> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> >> Thanks! >> >> Any ideas on how to test this? I'd like to commit message to be more >> precise - have we been running with one slice too few? Or hardware >> ignores the undocumented bit combination? Or even, is the >> documentation perhaps incorrect?! > > > Yeah the tests that I wrote to test the API you're picking up use a > MI_PREDICATE where you can make a command (let's say > MI_STORE_REGISTER_MEM) conditional on the number of slice. > > You can use that in a user batch an verify that memory has or hasn't > been written to. > > That's only before Gen10 though. > > > I'm looking at whether something that is passed onto the EUs could map > back to the physical location. > > Not sure if there is something... I've sent an exploratory IGT based on your MI_SET_PREDICATE code to CI. Although that only works on the basis of slices, and here doubt is about subslices. But at least it should log subslice configuration so we'll see if the incorrect programming survived in the register, or on read-back we get a different value. Otherwise I was thinking if in some Mesa test farm you have an appropriate BXT/GLK and could notice a performance difference before and after this patch? But again, sounds so unbelievable we would be running with one disabled slice all this time.. Regards, Tvrtko
On 23/08/2018 09:59, Tvrtko Ursulin wrote: > On 22/08/2018 16:47, Lionel Landwerlin wrote: >> On 22/08/2018 16:27, Tvrtko Ursulin wrote: >>> >>> On 22/08/2018 16:22, Lionel Landwerlin wrote: >>>> On 22/08/2018 16:17, Tvrtko Ursulin wrote: >>>>> >>>>> On 22/08/2018 16:08, Lionel Landwerlin wrote: >>>>>> On 22/08/2018 15:29, Tvrtko Ursulin wrote: >>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> >>>>>>> According to the documentation, when programming the subslice >>>>>>> count power- >>>>>>> gating configuration register, the value to be written into it on >>>>>>> Gen9LP >>>>>>> should actually in the format of: >>>>>>> >>>>>>> 1 slice = 0x001 >>>>>>> 2 slices = 0x010 >>>>>>> 3 slices = 0x100 >>>>>> >>>>>> >>>>>> s/slice/subslice/ >>>>>> >>>>>> >>>>>> Also 0b001 etc... Not hexadecimal. >>>>> >>>>> Oops, you're right. >>>>> >>>>>>> >>>>>>> And not the popcount of the enabled subslice mask as on other >>>>>>> platforms. >>>>>>> >>>>>>> So on Gen9LP platforms we have been programming 0x11 into those >>>>>>> bits, but >>>>>>> the documentation does not explain what would that achieve. Could >>>>>>> it be >>>>>>> that we enable only two subslice on three sub-slice parts? Or >>>>>>> hardware >>>>>>> simply ignores it and sticks with the maximum configuration? >>>>>>> >>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>>> Bspec: 12247 >>>>>>> --- >>>>>>> Could this actually be true or I am severely misreading the docs? >>>>>>> It does >>>>>>> not sound plausible to me this would have been missed all this >>>>>>> time.. >>>>>>> >>>>>>> How to test in what configuration do these parts run before and >>>>>>> after this >>>>>>> patch? >>>>>>> --- >>>>>>> drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++-- >>>>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> index 36050f085071..cdfa962a1975 100644 >>>>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>>>> } >>>>>>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >>>>>>> + u8 val; >>>>>>> + >>>>>>> + val = >>>>>>> hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); >>>>>>> + >>>>>>> + if (IS_GEN9_LP(dev_priv)) >>>>>>> + val = BIT(val - 1); >>>>>> >>>>>> >>>>>> Hmm... Are you breaking the 2 subslices setting here then? >>>>>> >>>>>> (2 subslices = 0b10 which should be equal to >>>>>> hweight8(subslice_mask) if I'm thinking right) >>>>> >>>>> No and yes, I think. >>>>> >>>>> subslice_mask = 0b011 => hweight = 2 => BIT(2 - 1) = BIT(1) = 0b010 >>>>> into the register >>>>> >>>>> In the same way, all together: >>>>> >>>>> subslice_mask = 0b001 => hweight = 1 => BIT(0) = 0b001 >>>>> subslice_mask = 0b011 => hweight = 2 => BIT(1) = 0b010 >>>>> subslice_mask = 0b111 => hweight = 3 => BIT(2) = 0b100 >>>>> >>>>> Have I made a mistake somewhere? >>>> >>>> >>>> Ah, yes! You're right :) >>>> >>>> My eyes got tricked, thanks for finding this out. >>> >>> At least half of the credit goes to you for linking to 12247 in scope >>> of one different thread! >>> >>>> >>>> With the comment fixed : >>>> >>>> >>>> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> >>> Thanks! >>> >>> Any ideas on how to test this? I'd like to commit message to be more >>> precise - have we been running with one slice too few? Or hardware >>> ignores the undocumented bit combination? Or even, is the >>> documentation perhaps incorrect?! >> >> >> Yeah the tests that I wrote to test the API you're picking up use a >> MI_PREDICATE where you can make a command (let's say >> MI_STORE_REGISTER_MEM) conditional on the number of slice. >> >> You can use that in a user batch an verify that memory has or hasn't >> been written to. >> >> That's only before Gen10 though. >> >> >> I'm looking at whether something that is passed onto the EUs could map >> back to the physical location. >> >> Not sure if there is something... > > I've sent an exploratory IGT based on your MI_SET_PREDICATE code to CI. > Although that only works on the basis of slices, and here doubt is about > subslices. But at least it should log subslice configuration so we'll > see if the incorrect programming survived in the register, or on > read-back we get a different value. > > Otherwise I was thinking if in some Mesa test farm you have an > appropriate BXT/GLK and could notice a performance difference before and > after this patch? > > But again, sounds so unbelievable we would be running with one disabled > slice all this time.. Results are finally back (https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_2831/fi-glk-dsi/igt@drv_selftest@live_contexts.html - starts at timestamp 457.627031 in dmesg) and empirically it seems that BXT/GLK handles the SScount identically to big core SKUs. In other words there is no switching penalty between SScount 0b11 and 0b100, whilst there is between different configs, implying that undocumented 0b11 correctly selected three subslices, equally as the documented 0b100. Small concern is that switching performance between 0b100<->0b10 and 0b11<->0b10 seems potentially greater than the margin of error which I can't explain other than maybe my test being imperfect, or noise is greater than I think. I will raise a note against bspec to see if we can clarify that via that route. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 36050f085071..cdfa962a1975 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2508,9 +2508,15 @@ make_rpcs(struct drm_i915_private *dev_priv) } if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { + u8 val; + + val = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]); + + if (IS_GEN9_LP(dev_priv)) + val = BIT(val - 1); + rpcs |= GEN8_RPCS_SS_CNT_ENABLE; - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) << - GEN8_RPCS_SS_CNT_SHIFT; + rpcs |= val << GEN8_RPCS_SS_CNT_SHIFT; rpcs |= GEN8_RPCS_ENABLE; }