diff mbox series

drm/i915/gvt: Fix cached atomics setting for Windows VM

Message ID 20210721062607.512307-1-zhenyuw@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gvt: Fix cached atomics setting for Windows VM | expand

Commit Message

Zhenyu Wang July 21, 2021, 6:26 a.m. UTC
We've seen recent regression with host and windows VM running
simultaneously that cause gpu hang or even crash. Finally bisect to
58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems
cached atomics behavior difference caused regression issue.

This trys to add new scratch register handler and add those in mmio
save/restore list for context switch. No gpu hang produced with this one.

Cc: stable@vger.kernel.org # 5.12+
Cc: "Xu, Terrence" <terrence.xu@intel.com>
Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9")
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/handlers.c     | 1 +
 drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++
 2 files changed, 3 insertions(+)

Comments

Daniel Vetter July 21, 2021, 9:08 a.m. UTC | #1
On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> We've seen recent regression with host and windows VM running
> simultaneously that cause gpu hang or even crash. Finally bisect to
> 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems
> cached atomics behavior difference caused regression issue.
>
> This trys to add new scratch register handler and add those in mmio
> save/restore list for context switch. No gpu hang produced with this one.
>
> Cc: stable@vger.kernel.org # 5.12+
> Cc: "Xu, Terrence" <terrence.xu@intel.com>
> Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9")
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Adding Jon Bloomfield, since different settings between linux and
windows for something that can hard-hang the machine on gen9 sounds
... not good.
-Daniel

> ---
>  drivers/gpu/drm/i915/gvt/handlers.c     | 1 +
>  drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
> index 98eb48c24c46..345b4be5ebad 100644
> --- a/drivers/gpu/drm/i915/gvt/handlers.c
> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt)
>         MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL);
>         MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL);
>         MMIO_D(_MMIO(0xb110), D_BDW);
> +       MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
>
>         MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0,
>                 D_BDW_PLUS, NULL, force_nonpriv_write);
> diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
> index b8ac80765461..f776c470914d 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
> +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
> @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = {
>         {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */
>         {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */
>         {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
> +       {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
> +       {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */
>         {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */
>         {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */
>         {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
> --
> 2.32.0.rc2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Jason Ekstrand July 21, 2021, 7:41 p.m. UTC | #2
On Wed, Jul 21, 2021 at 4:08 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > We've seen recent regression with host and windows VM running
> > simultaneously that cause gpu hang or even crash. Finally bisect to
> > 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems
> > cached atomics behavior difference caused regression issue.
> >
> > This trys to add new scratch register handler and add those in mmio
> > save/restore list for context switch. No gpu hang produced with this one.
> >
> > Cc: stable@vger.kernel.org # 5.12+
> > Cc: "Xu, Terrence" <terrence.xu@intel.com>
> > Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9")
> > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
>
> Adding Jon Bloomfield, since different settings between linux and
> windows for something that can hard-hang the machine on gen9 sounds
> ... not good.

The difference there is legit and intentional.

As far as what we do about it for GVT, if we can safely smash L3
atomics off underneath Windows without causing problems for the VM, we
should do that.  If not, we need to discuss this internally before
proceeding.

--Jason

> -Daniel
>
> > ---
> >  drivers/gpu/drm/i915/gvt/handlers.c     | 1 +
> >  drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++
> >  2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
> > index 98eb48c24c46..345b4be5ebad 100644
> > --- a/drivers/gpu/drm/i915/gvt/handlers.c
> > +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> > @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt)
> >         MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL);
> >         MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL);
> >         MMIO_D(_MMIO(0xb110), D_BDW);
> > +       MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
> >
> >         MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0,
> >                 D_BDW_PLUS, NULL, force_nonpriv_write);
> > diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
> > index b8ac80765461..f776c470914d 100644
> > --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
> > +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
> > @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = {
> >         {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */
> >         {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */
> >         {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
> > +       {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
> > +       {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */
> >         {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */
> >         {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */
> >         {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
> > --
> > 2.32.0.rc2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index 98eb48c24c46..345b4be5ebad 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -3134,6 +3134,7 @@  static int init_bdw_mmio_info(struct intel_gvt *gvt)
 	MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL);
 	MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL);
 	MMIO_D(_MMIO(0xb110), D_BDW);
+	MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
 
 	MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0,
 		D_BDW_PLUS, NULL, force_nonpriv_write);
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index b8ac80765461..f776c470914d 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -105,6 +105,8 @@  static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = {
 	{RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */
 	{RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */
 	{RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
+	{RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
+	{RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */
 	{RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */
 	{RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */
 	{RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */