Message ID | 20180822161805.7105-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/icl: Fix context slice count configuration | expand |
On 22/08/2018 17:18, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Bitfield width for configuring the active slice count has grown in Gen11 > so we need to program the GEN8_R_PWR_CLK_STATE accordingly. > > Current code was always requesting eight times the number of slices (due > writting to a bitfield starting three bits higher than it should). These > requests were luckily a) capped by the hardware to the available number of > slices, and b) we haven't yet exported the code to ask for reduced slice > configurations. > > Due both of the above there was no impact from this incorrect programming > but we should still fix it. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Bspec: 12247 > Reported-by: tony.ye@intel.com > Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: tony.ye@intel.com > --- > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- > 2 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index 59d06d0055bb..640f7b774a26 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) > #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) > #define GEN8_RPCS_S_CNT_SHIFT 15 > #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) > +#define GEN11_RPCS_S_CNT_SHIFT 12 > +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT) > #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) > #define GEN8_RPCS_SS_CNT_SHIFT 8 > #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT) > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 36050f085071..43b8b0675ba0 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) > * enablement. > */ > if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { > - rpcs |= GEN8_RPCS_S_CNT_ENABLE; > - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << > - GEN8_RPCS_S_CNT_SHIFT; > - rpcs |= GEN8_RPCS_ENABLE; > + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); > + > + if (INTEL_GEN(dev_priv) >= 11) > + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; > + else > + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; > + > + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; I don't know if you saw that wording in the documentation : " Note: In ICL, software programs this register as if GT consists of 2 slices with 4 subslices in each slice. Hardware maps this to the 1 slice/8-subslice physical layout. " My understanding is that it would make this function a bit more complicated ;) It also implies that on ICL you cannot select 3 subslices, which is unfortunately what Tony was trying to do. Maybe some opens need to be raised as to what's possible on ICL. > } > > if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
On 22/08/2018 17:33, Lionel Landwerlin wrote: > On 22/08/2018 17:18, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Bitfield width for configuring the active slice count has grown in Gen11 >> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >> >> Current code was always requesting eight times the number of slices (due >> writting to a bitfield starting three bits higher than it should). These >> requests were luckily a) capped by the hardware to the available >> number of >> slices, and b) we haven't yet exported the code to ask for reduced slice >> configurations. >> >> Due both of the above there was no impact from this incorrect programming >> but we should still fix it. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Bspec: 12247 >> Reported-by: tony.ye@intel.com >> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: tony.ye@intel.com >> --- >> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >> 2 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_reg.h >> b/drivers/gpu/drm/i915/i915_reg.h >> index 59d06d0055bb..640f7b774a26 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h >> @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t >> reg) >> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >> #define GEN8_RPCS_S_CNT_SHIFT 15 >> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >> +#define GEN11_RPCS_S_CNT_SHIFT 12 >> +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT) >> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >> #define GEN8_RPCS_SS_CNT_SHIFT 8 >> #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT) >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index 36050f085071..43b8b0675ba0 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >> * enablement. >> */ >> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >> - GEN8_RPCS_S_CNT_SHIFT; >> - rpcs |= GEN8_RPCS_ENABLE; >> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >> + >> + if (INTEL_GEN(dev_priv) >= 11) >> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >> + else >> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >> + >> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; > > > I don't know if you saw that wording in the documentation : > > " > > Note: In ICL, software programs this register as if GT consists of 2 > slices with 4 subslices in each slice. Hardware maps this to the 1 > slice/8-subslice physical layout. > > " > > > My understanding is that it would make this function a bit more > complicated ;) > > It also implies that on ICL you cannot select 3 subslices, which is > unfortunately what Tony was trying to do. > > Maybe some opens need to be raised as to what's possible on ICL. I interpreted the note in my head as "In ICL, _if_ _the_ software programs.." so did not see a problem. Thought that would be just some hidden/under the covers remapping hw would do. But I see now that was wrong, and you are most likely right. I'll try to do some digging to understand this better. But for the second part of it, I don't see why 1x3 configuration would be illegal. If software must assume hw is 2x4, even if in reality it is 1x8, then 1x3 would still be legal, no? I thought the cause of the hang on ICL was that when Tony was asking for 1x3, the code was actually programming a request for 8x3 - which is illegal (as in slice count must be 1 to enable subslice pg) and so would fail to actually turn of the unwanted subslices. But then I also though on ICL we deal with masks and not counts when programming the hardware. Since apparently it is counts both for slices and subslices, I am mightily confused as to how media-driver would even theoretically be able to turn off a _specific_ (sub)slice?! Regards, Tvrtko > >> } >> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { > > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 22/08/2018 18:07, Tvrtko Ursulin wrote: > > On 22/08/2018 17:33, Lionel Landwerlin wrote: >> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> Bitfield width for configuring the active slice count has grown in >>> Gen11 >>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>> >>> Current code was always requesting eight times the number of slices >>> (due >>> writting to a bitfield starting three bits higher than it should). >>> These >>> requests were luckily a) capped by the hardware to the available >>> number of >>> slices, and b) we haven't yet exported the code to ask for reduced >>> slice >>> configurations. >>> >>> Due both of the above there was no impact from this incorrect >>> programming >>> but we should still fix it. >>> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Bspec: 12247 >>> Reported-by: tony.ye@intel.com >>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> Cc: tony.ye@intel.com >>> --- >>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>> 2 files changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>> b/drivers/gpu/drm/i915/i915_reg.h >>> index 59d06d0055bb..640f7b774a26 100644 >>> --- a/drivers/gpu/drm/i915/i915_reg.h >>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>> @@ -344,6 +344,8 @@ static inline bool >>> i915_mmio_reg_valid(i915_reg_t reg) >>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>> GEN11_RPCS_S_CNT_SHIFT) >>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>> GEN8_RPCS_SS_CNT_SHIFT) >>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>> b/drivers/gpu/drm/i915/intel_lrc.c >>> index 36050f085071..43b8b0675ba0 100644 >>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>> * enablement. >>> */ >>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>> - GEN8_RPCS_S_CNT_SHIFT; >>> - rpcs |= GEN8_RPCS_ENABLE; >>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>> + >>> + if (INTEL_GEN(dev_priv) >= 11) >>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>> + else >>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>> + >>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >> >> >> I don't know if you saw that wording in the documentation : >> >> " >> >> Note: In ICL, software programs this register as if GT consists of 2 >> slices with 4 subslices in each slice. Hardware maps this to the 1 >> slice/8-subslice physical layout. >> >> " >> >> >> My understanding is that it would make this function a bit more >> complicated ;) >> >> It also implies that on ICL you cannot select 3 subslices, which is >> unfortunately what Tony was trying to do. >> >> Maybe some opens need to be raised as to what's possible on ICL. > > I interpreted the note in my head as "In ICL, _if_ _the_ software > programs.." so did not see a problem. Thought that would be just some > hidden/under the covers remapping hw would do. But I see now that was > wrong, and you are most likely right. I'll try to do some digging to > understand this better. > > But for the second part of it, I don't see why 1x3 configuration would > be illegal. If software must assume hw is 2x4, even if in reality it > is 1x8, then 1x3 would still be legal, no? We still seem to put a subslice number in the register for ICL (values being 0b001, 0b010, 0b011 & 0b100). My understanding is that the hardware will just multiply that value by 2 to map to the 1x8 underlying topology. So if that's the case, you can't really do odd numbers... ¯\_(ツ)_/¯ > > I thought the cause of the hang on ICL was that when Tony was asking > for 1x3, the code was actually programming a request for 8x3 - which > is illegal (as in slice count must be 1 to enable subslice pg) and so > would fail to actually turn of the unwanted subslices. > > But then I also though on ICL we deal with masks and not counts when > programming the hardware. Since apparently it is counts both for > slices and subslices, I am mightily confused as to how media-driver > would even theoretically be able to turn off a _specific_ (sub)slice?! The fact that the feature needed isn't implemented at by the thread dispatcher is really strange to me too. It sounds like we're forced to use a bigger hammer than what we really need. As to how that maps to the right subslices is also unknown to me. The only explanation I have is that subslices with no VME samplers get turn off first in the list of subslices to turn off. Cheers, - Lionel > > Regards, > > Tvrtko > >> >>> } >>> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { >> >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 22/08/2018 18:07, Tvrtko Ursulin wrote:<br> </div> <blockquote type="cite" cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com"> <br> On 22/08/2018 17:33, Lionel Landwerlin wrote: <br> <blockquote type="cite">On 22/08/2018 17:18, Tvrtko Ursulin wrote: <br> <blockquote type="cite">From: Tvrtko Ursulin <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a> <br> <br> Bitfield width for configuring the active slice count has grown in Gen11 <br> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. <br> <br> Current code was always requesting eight times the number of slices (due <br> writting to a bitfield starting three bits higher than it should). These <br> requests were luckily a) capped by the hardware to the available number of <br> slices, and b) we haven't yet exported the code to ask for reduced slice <br> configurations. <br> <br> Due both of the above there was no impact from this incorrect programming <br> but we should still fix it. <br> <br> Signed-off-by: Tvrtko Ursulin <a class="moz-txt-link-rfc2396E" href="mailto:tvrtko.ursulin@intel.com"><tvrtko.ursulin@intel.com></a> <br> Bspec: 12247 <br> Reported-by: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a> <br> Suggested-by: Lionel Landwerlin <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a> <br> Cc: Lionel Landwerlin <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a> <br> Cc: <a class="moz-txt-link-abbreviated" href="mailto:tony.ye@intel.com">tony.ye@intel.com</a> <br> --- <br> drivers/gpu/drm/i915/i915_reg.h | 2 ++ <br> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- <br> 2 files changed, 10 insertions(+), 4 deletions(-) <br> <br> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h <br> index 59d06d0055bb..640f7b774a26 100644 <br> --- a/drivers/gpu/drm/i915/i915_reg.h <br> +++ b/drivers/gpu/drm/i915/i915_reg.h <br> @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) <br> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) <br> #define GEN8_RPCS_S_CNT_SHIFT 15 <br> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) <br> +#define GEN11_RPCS_S_CNT_SHIFT 12 <br> +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT) <br> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) <br> #define GEN8_RPCS_SS_CNT_SHIFT 8 <br> #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT) <br> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c <br> index 36050f085071..43b8b0675ba0 100644 <br> --- a/drivers/gpu/drm/i915/intel_lrc.c <br> +++ b/drivers/gpu/drm/i915/intel_lrc.c <br> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) <br> * enablement. <br> */ <br> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { <br> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; <br> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << <br> - GEN8_RPCS_S_CNT_SHIFT; <br> - rpcs |= GEN8_RPCS_ENABLE; <br> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); <br> + <br> + if (INTEL_GEN(dev_priv) >= 11) <br> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; <br> + else <br> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; <br> + <br> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; <br> </blockquote> <br> <br> I don't know if you saw that wording in the documentation : <br> <br> " <br> <br> Note: In ICL, software programs this register as if GT consists of 2 slices with 4 subslices in each slice. Hardware maps this to the 1 slice/8-subslice physical layout. <br> <br> " <br> <br> <br> My understanding is that it would make this function a bit more complicated ;) <br> <br> It also implies that on ICL you cannot select 3 subslices, which is unfortunately what Tony was trying to do. <br> <br> Maybe some opens need to be raised as to what's possible on ICL. <br> </blockquote> <br> I interpreted the note in my head as "In ICL, _if_ _the_ software programs.." so did not see a problem. Thought that would be just some hidden/under the covers remapping hw would do. But I see now that was wrong, and you are most likely right. I'll try to do some digging to understand this better. <br> <br> But for the second part of it, I don't see why 1x3 configuration would be illegal. If software must assume hw is 2x4, even if in reality it is 1x8, then 1x3 would still be legal, no? <br> </blockquote> <p><br> </p> <p>We still seem to put a subslice number in the register for ICL (values being 0b001, 0b010, 0b011 & 0b100).</p> <p>My understanding is that the hardware will just multiply that value by 2 to map to the 1x8 underlying topology.</p> <p>So if that's the case, you can't really do odd numbers... <span style="color: rgb(0, 0, 0); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">¯\_(ツ)_/¯</span></p> <p><br> </p> <blockquote type="cite" cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com"> <br> I thought the cause of the hang on ICL was that when Tony was asking for 1x3, the code was actually programming a request for 8x3 - which is illegal (as in slice count must be 1 to enable subslice pg) and so would fail to actually turn of the unwanted subslices. <br> <br> But then I also though on ICL we deal with masks and not counts when programming the hardware. Since apparently it is counts both for slices and subslices, I am mightily confused as to how media-driver would even theoretically be able to turn off a _specific_ (sub)slice?! <br> </blockquote> <p><br> </p> <p>The fact that the feature needed isn't implemented at by the thread dispatcher is really strange to me too.</p> <p>It sounds like we're forced to use a bigger hammer than what we really need.</p> <p>As to how that maps to the right subslices is also unknown to me.</p> <p>The only explanation I have is that subslices with no VME samplers get turn off first in the list of subslices to turn off.</p> <p><br> </p> <p>Cheers,</p> <p><br> </p> <p>-</p> <p>Lionel<br> </p> <p><br> </p> <blockquote type="cite" cite="mid:09d263ed-bc18-2925-f3a8-9875ed584895@linux.intel.com"> <br> Regards, <br> <br> Tvrtko <br> <br> <blockquote type="cite"> <br> <blockquote type="cite"> } <br> if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) { <br> </blockquote> <br> <br> _______________________________________________ <br> Intel-gfx mailing list <br> <a class="moz-txt-link-abbreviated" href="mailto:Intel-gfx@lists.freedesktop.org">Intel-gfx@lists.freedesktop.org</a> <br> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx">https://lists.freedesktop.org/mailman/listinfo/intel-gfx</a> <br> </blockquote> <br> </blockquote> <p><br> </p> </body> </html>
On 22/08/2018 17:33, Lionel Landwerlin wrote: > On 22/08/2018 17:18, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Bitfield width for configuring the active slice count has grown in Gen11 >> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >> >> Current code was always requesting eight times the number of slices (due >> writting to a bitfield starting three bits higher than it should). These >> requests were luckily a) capped by the hardware to the available >> number of >> slices, and b) we haven't yet exported the code to ask for reduced slice >> configurations. >> >> Due both of the above there was no impact from this incorrect programming >> but we should still fix it. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Bspec: 12247 >> Reported-by: tony.ye@intel.com >> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: tony.ye@intel.com >> --- >> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >> 2 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_reg.h >> b/drivers/gpu/drm/i915/i915_reg.h >> index 59d06d0055bb..640f7b774a26 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h >> @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t >> reg) >> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >> #define GEN8_RPCS_S_CNT_SHIFT 15 >> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >> +#define GEN11_RPCS_S_CNT_SHIFT 12 >> +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT) >> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >> #define GEN8_RPCS_SS_CNT_SHIFT 8 >> #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT) >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index 36050f085071..43b8b0675ba0 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >> * enablement. >> */ >> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >> - GEN8_RPCS_S_CNT_SHIFT; >> - rpcs |= GEN8_RPCS_ENABLE; >> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >> + >> + if (INTEL_GEN(dev_priv) >= 11) >> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >> + else >> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >> + >> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; > > > I don't know if you saw that wording in the documentation : > > " > > Note: In ICL, software programs this register as if GT consists of 2 > slices with 4 subslices in each slice. Hardware maps this to the 1 > slice/8-subslice physical layout. > > " > > > My understanding is that it would make this function a bit more > complicated ;) > > It also implies that on ICL you cannot select 3 subslices, which is > unfortunately what Tony was trying to do. > > Maybe some opens need to be raised as to what's possible on ICL. Happy to r-b this one since we clarified it is fine? Regards, Tvrtko
On 29/08/2018 11:54, Tvrtko Ursulin wrote: > > On 22/08/2018 17:33, Lionel Landwerlin wrote: >> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> Bitfield width for configuring the active slice count has grown in >>> Gen11 >>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>> >>> Current code was always requesting eight times the number of slices >>> (due >>> writting to a bitfield starting three bits higher than it should). >>> These >>> requests were luckily a) capped by the hardware to the available >>> number of >>> slices, and b) we haven't yet exported the code to ask for reduced >>> slice >>> configurations. >>> >>> Due both of the above there was no impact from this incorrect >>> programming >>> but we should still fix it. >>> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Bspec: 12247 >>> Reported-by: tony.ye@intel.com >>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>> Cc: tony.ye@intel.com >>> --- >>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>> 2 files changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>> b/drivers/gpu/drm/i915/i915_reg.h >>> index 59d06d0055bb..640f7b774a26 100644 >>> --- a/drivers/gpu/drm/i915/i915_reg.h >>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>> @@ -344,6 +344,8 @@ static inline bool >>> i915_mmio_reg_valid(i915_reg_t reg) >>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>> GEN11_RPCS_S_CNT_SHIFT) >>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>> GEN8_RPCS_SS_CNT_SHIFT) >>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>> b/drivers/gpu/drm/i915/intel_lrc.c >>> index 36050f085071..43b8b0675ba0 100644 >>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>> * enablement. >>> */ >>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>> - GEN8_RPCS_S_CNT_SHIFT; >>> - rpcs |= GEN8_RPCS_ENABLE; >>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>> + >>> + if (INTEL_GEN(dev_priv) >= 11) >>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>> + else >>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>> + >>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >> >> >> I don't know if you saw that wording in the documentation : >> >> " >> >> Note: In ICL, software programs this register as if GT consists of 2 >> slices with 4 subslices in each slice. Hardware maps this to the 1 >> slice/8-subslice physical layout. >> >> " >> >> >> My understanding is that it would make this function a bit more >> complicated ;) >> >> It also implies that on ICL you cannot select 3 subslices, which is >> unfortunately what Tony was trying to do. >> >> Maybe some opens need to be raised as to what's possible on ICL. > > Happy to r-b this one since we clarified it is fine? > > Regards, > > Tvrtko > > I think there is still an issue here with regard to the subslice programming. You can only program values in [1, 4]. But because we expose up to 8 subslices, you need to take that mask and alter the slice mask based on its value. - Lionel
On 29/08/2018 12:07, Lionel Landwerlin wrote: > On 29/08/2018 11:54, Tvrtko Ursulin wrote: >> >> On 22/08/2018 17:33, Lionel Landwerlin wrote: >>> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Bitfield width for configuring the active slice count has grown in >>>> Gen11 >>>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>>> >>>> Current code was always requesting eight times the number of slices >>>> (due >>>> writting to a bitfield starting three bits higher than it should). >>>> These >>>> requests were luckily a) capped by the hardware to the available >>>> number of >>>> slices, and b) we haven't yet exported the code to ask for reduced >>>> slice >>>> configurations. >>>> >>>> Due both of the above there was no impact from this incorrect >>>> programming >>>> but we should still fix it. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> Bspec: 12247 >>>> Reported-by: tony.ye@intel.com >>>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>> Cc: tony.ye@intel.com >>>> --- >>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>>> 2 files changed, 10 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>> b/drivers/gpu/drm/i915/i915_reg.h >>>> index 59d06d0055bb..640f7b774a26 100644 >>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>>> @@ -344,6 +344,8 @@ static inline bool >>>> i915_mmio_reg_valid(i915_reg_t reg) >>>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>>> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >>>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>>> GEN11_RPCS_S_CNT_SHIFT) >>>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>>> GEN8_RPCS_SS_CNT_SHIFT) >>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>> index 36050f085071..43b8b0675ba0 100644 >>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>> * enablement. >>>> */ >>>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>>> - GEN8_RPCS_S_CNT_SHIFT; >>>> - rpcs |= GEN8_RPCS_ENABLE; >>>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>>> + >>>> + if (INTEL_GEN(dev_priv) >= 11) >>>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>>> + else >>>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>>> + >>>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >>> >>> >>> I don't know if you saw that wording in the documentation : >>> >>> " >>> >>> Note: In ICL, software programs this register as if GT consists of 2 >>> slices with 4 subslices in each slice. Hardware maps this to the 1 >>> slice/8-subslice physical layout. >>> >>> " >>> >>> >>> My understanding is that it would make this function a bit more >>> complicated ;) >>> >>> It also implies that on ICL you cannot select 3 subslices, which is >>> unfortunately what Tony was trying to do. >>> >>> Maybe some opens need to be raised as to what's possible on ICL. >> >> Happy to r-b this one since we clarified it is fine? >> >> Regards, >> >> Tvrtko >> >> > I think there is still an issue here with regard to the subslice > programming. > You can only program values in [1, 4]. > But because we expose up to 8 subslices, you need to take that mask and > alter the slice mask based on its value. Hmm true, thanks. I think we need to somehow express this restriction on the uAPI level rather than silently mask it. Shall we reject attempts to configure subslice mask if hweight(mask) > max_subslices / 2 for ICL-LP? Regards, Tvrtko
On 29/08/2018 13:02, Tvrtko Ursulin wrote: > > On 29/08/2018 12:07, Lionel Landwerlin wrote: >> On 29/08/2018 11:54, Tvrtko Ursulin wrote: >>> >>> On 22/08/2018 17:33, Lionel Landwerlin wrote: >>>> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> Bitfield width for configuring the active slice count has grown in >>>>> Gen11 >>>>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>>>> >>>>> Current code was always requesting eight times the number of slices >>>>> (due >>>>> writting to a bitfield starting three bits higher than it should). >>>>> These >>>>> requests were luckily a) capped by the hardware to the available >>>>> number of >>>>> slices, and b) we haven't yet exported the code to ask for reduced >>>>> slice >>>>> configurations. >>>>> >>>>> Due both of the above there was no impact from this incorrect >>>>> programming >>>>> but we should still fix it. >>>>> >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> Bspec: 12247 >>>>> Reported-by: tony.ye@intel.com >>>>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>> Cc: tony.ye@intel.com >>>>> --- >>>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>>>> 2 files changed, 10 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>>> b/drivers/gpu/drm/i915/i915_reg.h >>>>> index 59d06d0055bb..640f7b774a26 100644 >>>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>>>> @@ -344,6 +344,8 @@ static inline bool >>>>> i915_mmio_reg_valid(i915_reg_t reg) >>>>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>>>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>>>> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) >>>>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>>>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>>>> GEN11_RPCS_S_CNT_SHIFT) >>>>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>>>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>>>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>>>> GEN8_RPCS_SS_CNT_SHIFT) >>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>> index 36050f085071..43b8b0675ba0 100644 >>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>> * enablement. >>>>> */ >>>>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>>>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>>>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>>>> - GEN8_RPCS_S_CNT_SHIFT; >>>>> - rpcs |= GEN8_RPCS_ENABLE; >>>>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>>>> + >>>>> + if (INTEL_GEN(dev_priv) >= 11) >>>>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>>>> + else >>>>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>>>> + >>>>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >>>> >>>> >>>> I don't know if you saw that wording in the documentation : >>>> >>>> " >>>> >>>> Note: In ICL, software programs this register as if GT consists of 2 >>>> slices with 4 subslices in each slice. Hardware maps this to the 1 >>>> slice/8-subslice physical layout. >>>> >>>> " >>>> >>>> >>>> My understanding is that it would make this function a bit more >>>> complicated ;) >>>> >>>> It also implies that on ICL you cannot select 3 subslices, which is >>>> unfortunately what Tony was trying to do. >>>> >>>> Maybe some opens need to be raised as to what's possible on ICL. >>> >>> Happy to r-b this one since we clarified it is fine? >>> >>> Regards, >>> >>> Tvrtko >>> >>> >> I think there is still an issue here with regard to the subslice >> programming. >> You can only program values in [1, 4]. >> But because we expose up to 8 subslices, you need to take that mask >> and alter the slice mask based on its value. > > Hmm true, thanks. I think we need to somehow express this restriction on > the uAPI level rather than silently mask it. Shall we reject attempts to > configure subslice mask if hweight(mask) > max_subslices / 2 for ICL-LP? No wait, I was in the dynamic sseu world, but this also applies to the current code base.. When we program SScount in drm-tip with the hweight of 8 we overflow the bitfield already and splat over SScountEn. Luckily with no ill effect since we set it anyway. So the fix needs to be in make_rpcs today.. if (GEN11_LP) { if (hweight(slice_mask) == 1) subslice_count /= 2; else subslice_en = 0; } I think.. Regards, Tvrtko
On 29/08/2018 13:09, Tvrtko Ursulin wrote: > > On 29/08/2018 13:02, Tvrtko Ursulin wrote: >> >> On 29/08/2018 12:07, Lionel Landwerlin wrote: >>> On 29/08/2018 11:54, Tvrtko Ursulin wrote: >>>> >>>> On 22/08/2018 17:33, Lionel Landwerlin wrote: >>>>> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> >>>>>> Bitfield width for configuring the active slice count has grown in >>>>>> Gen11 >>>>>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>>>>> >>>>>> Current code was always requesting eight times the number of >>>>>> slices (due >>>>>> writting to a bitfield starting three bits higher than it should). >>>>>> These >>>>>> requests were luckily a) capped by the hardware to the available >>>>>> number of >>>>>> slices, and b) we haven't yet exported the code to ask for reduced >>>>>> slice >>>>>> configurations. >>>>>> >>>>>> Due both of the above there was no impact from this incorrect >>>>>> programming >>>>>> but we should still fix it. >>>>>> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> Bspec: 12247 >>>>>> Reported-by: tony.ye@intel.com >>>>>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>> Cc: tony.ye@intel.com >>>>>> --- >>>>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>>>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>>>>> 2 files changed, 10 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>>>> b/drivers/gpu/drm/i915/i915_reg.h >>>>>> index 59d06d0055bb..640f7b774a26 100644 >>>>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>>>>> @@ -344,6 +344,8 @@ static inline bool >>>>>> i915_mmio_reg_valid(i915_reg_t reg) >>>>>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>>>>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>>>>> #define GEN8_RPCS_S_CNT_MASK (0x7 << >>>>>> GEN8_RPCS_S_CNT_SHIFT) >>>>>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>>>>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>>>>> GEN11_RPCS_S_CNT_SHIFT) >>>>>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>>>>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>>>>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>>>>> GEN8_RPCS_SS_CNT_SHIFT) >>>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>>> index 36050f085071..43b8b0675ba0 100644 >>>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>>> * enablement. >>>>>> */ >>>>>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>>>>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>>>>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>>>>> - GEN8_RPCS_S_CNT_SHIFT; >>>>>> - rpcs |= GEN8_RPCS_ENABLE; >>>>>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>>>>> + >>>>>> + if (INTEL_GEN(dev_priv) >= 11) >>>>>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>>>>> + else >>>>>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>>>>> + >>>>>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >>>>> >>>>> >>>>> I don't know if you saw that wording in the documentation : >>>>> >>>>> " >>>>> >>>>> Note: In ICL, software programs this register as if GT consists of >>>>> 2 slices with 4 subslices in each slice. Hardware maps this to the >>>>> 1 slice/8-subslice physical layout. >>>>> >>>>> " >>>>> >>>>> >>>>> My understanding is that it would make this function a bit more >>>>> complicated ;) >>>>> >>>>> It also implies that on ICL you cannot select 3 subslices, which is >>>>> unfortunately what Tony was trying to do. >>>>> >>>>> Maybe some opens need to be raised as to what's possible on ICL. >>>> >>>> Happy to r-b this one since we clarified it is fine? >>>> >>>> Regards, >>>> >>>> Tvrtko >>>> >>>> >>> I think there is still an issue here with regard to the subslice >>> programming. >>> You can only program values in [1, 4]. >>> But because we expose up to 8 subslices, you need to take that mask >>> and alter the slice mask based on its value. >> >> Hmm true, thanks. I think we need to somehow express this restriction >> on the uAPI level rather than silently mask it. Shall we reject >> attempts to configure subslice mask if hweight(mask) > max_subslices / >> 2 for ICL-LP? > > No wait, I was in the dynamic sseu world, but this also applies to the > current code base.. > > When we program SScount in drm-tip with the hweight of 8 we overflow the > bitfield already and splat over SScountEn. Luckily with no ill effect > since we set it anyway. > > So the fix needs to be in make_rpcs today.. > > if (GEN11_LP) { > if (hweight(slice_mask) == 1) && hweight(subslice_mask) < max_subslices So to leave the default full config alone. > subslice_count /= 2; > else > subslice_en = 0; > } > > I think.. > > Regards, > > Tvrtko > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 29/08/2018 13:11, Tvrtko Ursulin wrote: > > > On 29/08/2018 13:09, Tvrtko Ursulin wrote: >> >> On 29/08/2018 13:02, Tvrtko Ursulin wrote: >>> >>> On 29/08/2018 12:07, Lionel Landwerlin wrote: >>>> On 29/08/2018 11:54, Tvrtko Ursulin wrote: >>>>> >>>>> On 22/08/2018 17:33, Lionel Landwerlin wrote: >>>>>> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> >>>>>>> Bitfield width for configuring the active slice count has grown >>>>>>> in Gen11 >>>>>>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>>>>>> >>>>>>> Current code was always requesting eight times the number of >>>>>>> slices (due >>>>>>> writting to a bitfield starting three bits higher than it >>>>>>> should). These >>>>>>> requests were luckily a) capped by the hardware to the available >>>>>>> number of >>>>>>> slices, and b) we haven't yet exported the code to ask for >>>>>>> reduced slice >>>>>>> configurations. >>>>>>> >>>>>>> Due both of the above there was no impact from this incorrect >>>>>>> programming >>>>>>> but we should still fix it. >>>>>>> >>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> Bspec: 12247 >>>>>>> Reported-by: tony.ye@intel.com >>>>>>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>>> Cc: tony.ye@intel.com >>>>>>> --- >>>>>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>>>>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>>>>>> 2 files changed, 10 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>>>>> b/drivers/gpu/drm/i915/i915_reg.h >>>>>>> index 59d06d0055bb..640f7b774a26 100644 >>>>>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>>>>>> @@ -344,6 +344,8 @@ static inline bool >>>>>>> i915_mmio_reg_valid(i915_reg_t reg) >>>>>>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>>>>>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>>>>>> #define GEN8_RPCS_S_CNT_MASK (0x7 << >>>>>>> GEN8_RPCS_S_CNT_SHIFT) >>>>>>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>>>>>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>>>>>> GEN11_RPCS_S_CNT_SHIFT) >>>>>>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>>>>>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>>>>>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>>>>>> GEN8_RPCS_SS_CNT_SHIFT) >>>>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> index 36050f085071..43b8b0675ba0 100644 >>>>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) >>>>>>> * enablement. >>>>>>> */ >>>>>>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>>>>>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>>>>>> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>>>>>> - GEN8_RPCS_S_CNT_SHIFT; >>>>>>> - rpcs |= GEN8_RPCS_ENABLE; >>>>>>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>>>>>> + >>>>>>> + if (INTEL_GEN(dev_priv) >= 11) >>>>>>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>>>>>> + else >>>>>>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>>>>>> + >>>>>>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >>>>>> >>>>>> >>>>>> I don't know if you saw that wording in the documentation : >>>>>> >>>>>> " >>>>>> >>>>>> Note: In ICL, software programs this register as if GT consists of >>>>>> 2 slices with 4 subslices in each slice. Hardware maps this to the >>>>>> 1 slice/8-subslice physical layout. >>>>>> >>>>>> " >>>>>> >>>>>> >>>>>> My understanding is that it would make this function a bit more >>>>>> complicated ;) >>>>>> >>>>>> It also implies that on ICL you cannot select 3 subslices, which >>>>>> is unfortunately what Tony was trying to do. >>>>>> >>>>>> Maybe some opens need to be raised as to what's possible on ICL. >>>>> >>>>> Happy to r-b this one since we clarified it is fine? >>>>> >>>>> Regards, >>>>> >>>>> Tvrtko >>>>> >>>>> >>>> I think there is still an issue here with regard to the subslice >>>> programming. >>>> You can only program values in [1, 4]. >>>> But because we expose up to 8 subslices, you need to take that mask >>>> and alter the slice mask based on its value. >>> >>> Hmm true, thanks. I think we need to somehow express this restriction >>> on the uAPI level rather than silently mask it. Shall we reject >>> attempts to configure subslice mask if hweight(mask) > max_subslices >>> / 2 for ICL-LP? >> >> No wait, I was in the dynamic sseu world, but this also applies to the >> current code base.. >> >> When we program SScount in drm-tip with the hweight of 8 we overflow >> the bitfield already and splat over SScountEn. Luckily with no ill >> effect since we set it anyway. >> >> So the fix needs to be in make_rpcs today.. >> >> if (GEN11_LP) { >> if (hweight(slice_mask) == 1) > > && hweight(subslice_mask) < max_subslices > > So to leave the default full config alone. Which means that drm-tip configures the 1x6x8 part as SScount=0b110, which is another undocumented value according to BSpec. Furthermore the allowed configuration table suggests only three subslices can be enabled when SScountEn is set, so like the situation on BXT/GLK, now I need to figure out if we are inadvertently only enabling half of the subslices on ICL-LP. Tvrtko >> subslice_count /= 2; >> else >> subslice_en = 0; >> } >> >> I think.. >> >> Regards, >> >> Tvrtko >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 29/08/2018 13:29, Tvrtko Ursulin wrote: > > On 29/08/2018 13:11, Tvrtko Ursulin wrote: >> >> >> On 29/08/2018 13:09, Tvrtko Ursulin wrote: >>> >>> On 29/08/2018 13:02, Tvrtko Ursulin wrote: >>>> >>>> On 29/08/2018 12:07, Lionel Landwerlin wrote: >>>>> On 29/08/2018 11:54, Tvrtko Ursulin wrote: >>>>>> >>>>>> On 22/08/2018 17:33, Lionel Landwerlin wrote: >>>>>>> On 22/08/2018 17:18, Tvrtko Ursulin wrote: >>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>>> >>>>>>>> Bitfield width for configuring the active slice count has grown >>>>>>>> in Gen11 >>>>>>>> so we need to program the GEN8_R_PWR_CLK_STATE accordingly. >>>>>>>> >>>>>>>> Current code was always requesting eight times the number of >>>>>>>> slices (due >>>>>>>> writting to a bitfield starting three bits higher than it >>>>>>>> should). These >>>>>>>> requests were luckily a) capped by the hardware to the >>>>>>>> available number of >>>>>>>> slices, and b) we haven't yet exported the code to ask for >>>>>>>> reduced slice >>>>>>>> configurations. >>>>>>>> >>>>>>>> Due both of the above there was no impact from this incorrect >>>>>>>> programming >>>>>>>> but we should still fix it. >>>>>>>> >>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>>> Bspec: 12247 >>>>>>>> Reported-by: tony.ye@intel.com >>>>>>>> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>>>>> Cc: tony.ye@intel.com >>>>>>>> --- >>>>>>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>>>>>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- >>>>>>>> 2 files changed, 10 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>>>>>> b/drivers/gpu/drm/i915/i915_reg.h >>>>>>>> index 59d06d0055bb..640f7b774a26 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>>>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>>>>>>> @@ -344,6 +344,8 @@ static inline bool >>>>>>>> i915_mmio_reg_valid(i915_reg_t reg) >>>>>>>> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) >>>>>>>> #define GEN8_RPCS_S_CNT_SHIFT 15 >>>>>>>> #define GEN8_RPCS_S_CNT_MASK (0x7 << >>>>>>>> GEN8_RPCS_S_CNT_SHIFT) >>>>>>>> +#define GEN11_RPCS_S_CNT_SHIFT 12 >>>>>>>> +#define GEN11_RPCS_S_CNT_MASK (0x3f << >>>>>>>> GEN11_RPCS_S_CNT_SHIFT) >>>>>>>> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) >>>>>>>> #define GEN8_RPCS_SS_CNT_SHIFT 8 >>>>>>>> #define GEN8_RPCS_SS_CNT_MASK (0x7 << >>>>>>>> GEN8_RPCS_SS_CNT_SHIFT) >>>>>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>>> b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>>> index 36050f085071..43b8b0675ba0 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c >>>>>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c >>>>>>>> @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private >>>>>>>> *dev_priv) >>>>>>>> * enablement. >>>>>>>> */ >>>>>>>> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { >>>>>>>> - rpcs |= GEN8_RPCS_S_CNT_ENABLE; >>>>>>>> - rpcs |= >>>>>>>> hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << >>>>>>>> - GEN8_RPCS_S_CNT_SHIFT; >>>>>>>> - rpcs |= GEN8_RPCS_ENABLE; >>>>>>>> + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); >>>>>>>> + >>>>>>>> + if (INTEL_GEN(dev_priv) >= 11) >>>>>>>> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; >>>>>>>> + else >>>>>>>> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; >>>>>>>> + >>>>>>>> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; >>>>>>> >>>>>>> >>>>>>> I don't know if you saw that wording in the documentation : >>>>>>> >>>>>>> " >>>>>>> >>>>>>> Note: In ICL, software programs this register as if GT consists >>>>>>> of 2 slices with 4 subslices in each slice. Hardware maps this >>>>>>> to the 1 slice/8-subslice physical layout. >>>>>>> >>>>>>> " >>>>>>> >>>>>>> >>>>>>> My understanding is that it would make this function a bit more >>>>>>> complicated ;) >>>>>>> >>>>>>> It also implies that on ICL you cannot select 3 subslices, which >>>>>>> is unfortunately what Tony was trying to do. >>>>>>> >>>>>>> Maybe some opens need to be raised as to what's possible on ICL. >>>>>> >>>>>> Happy to r-b this one since we clarified it is fine? >>>>>> >>>>>> Regards, >>>>>> >>>>>> Tvrtko >>>>>> >>>>>> >>>>> I think there is still an issue here with regard to the subslice >>>>> programming. >>>>> You can only program values in [1, 4]. >>>>> But because we expose up to 8 subslices, you need to take that >>>>> mask and alter the slice mask based on its value. >>>> >>>> Hmm true, thanks. I think we need to somehow express this >>>> restriction on the uAPI level rather than silently mask it. Shall >>>> we reject attempts to configure subslice mask if hweight(mask) > >>>> max_subslices / 2 for ICL-LP? >>> >>> No wait, I was in the dynamic sseu world, but this also applies to >>> the current code base.. >>> >>> When we program SScount in drm-tip with the hweight of 8 we overflow >>> the bitfield already and splat over SScountEn. Luckily with no ill >>> effect since we set it anyway. >>> >>> So the fix needs to be in make_rpcs today.. >>> >>> if (GEN11_LP) { >>> if (hweight(slice_mask) == 1) >> >> && hweight(subslice_mask) < max_subslices >> >> So to leave the default full config alone. > > Which means that drm-tip configures the 1x6x8 part as SScount=0b110, > which is another undocumented value according to BSpec. Furthermore > the allowed configuration table suggests only three subslices can be > enabled when SScountEn is set, so like the situation on BXT/GLK, now I > need to figure out if we are inadvertently only enabling half of the > subslices on ICL-LP. Or maybe make the upper 4 bits of the subslice map to "big subslices" and lower bottom 4 to "small subslices". Then have the restrictions checked in ioctl() context_set_param. - Lionel > > Tvrtko > >>> subslice_count /= 2; >>> else >>> subslice_en = 0; >>> } >>> >>> I think.. >>> >>> Regards, >>> >>> Tvrtko >>> _______________________________________________ >>> Intel-gfx mailing list >>> Intel-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 59d06d0055bb..640f7b774a26 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) #define GEN8_RPCS_S_CNT_SHIFT 15 #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT) +#define GEN11_RPCS_S_CNT_SHIFT 12 +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT) #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11) #define GEN8_RPCS_SS_CNT_SHIFT 8 #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 36050f085071..43b8b0675ba0 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2501,10 +2501,14 @@ make_rpcs(struct drm_i915_private *dev_priv) * enablement. */ if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) { - rpcs |= GEN8_RPCS_S_CNT_ENABLE; - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) << - GEN8_RPCS_S_CNT_SHIFT; - rpcs |= GEN8_RPCS_ENABLE; + rpcs = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask); + + if (INTEL_GEN(dev_priv) >= 11) + rpcs <<= GEN11_RPCS_S_CNT_SHIFT; + else + rpcs <<= GEN8_RPCS_S_CNT_SHIFT; + + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE; } if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {