drm/i915/hsw: enable atomic in L3 for some steppings.

Message ID	1420419950-3135-1-git-send-email-zhigang.gong@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Zhigang Gong <zhigang.gong@intel.com> To: intel-gfx@lists.freedesktop.org Date: Mon, 5 Jan 2015 09:05:50 +0800 Message-Id: <1420419950-3135-1-git-send-email-zhigang.gong@intel.com> Cc: Zhigang Gong <zhigang.gong@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings. Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Zhigang Gong Jan. 5, 2015, 1:05 a.m. UTC

According to bspec, ROW_CHICKEN3's bit 6 which is to disable
move of cacheable global atomics to L3 is needed for GT3 D
stepping.

I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
The atomics works fine in beignet. And it could get more than 10x performance
improvement with some workload, for an example, the "splat" kernel in darktable,
without this patch, it consumes 50 seconds in one large raw picture processing.
But with this patch, the same process only takes less than 1 second.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

zhigang.gong@linux.intel.com Jan. 5, 2015, 2:36 a.m. UTC | #1

On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
> Zhigang Gong <zhigang.gong@intel.com> writes:
> 
> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
> > move of cacheable global atomics to L3 is needed for GT3 D
> > stepping.
> >
> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
> > The atomics works fine in beignet. And it could get more than 10x performance
> > improvement with some workload, for an example, the "splat" kernel in darktable,
> > without this patch, it consumes 50 seconds in one large raw picture processing.
> > But with this patch, the same process only takes less than 1 second.
> >
> 
> I tried this already (on HSW GT2 D as well) and I don't think it's
> enough to get L3 atomics working reliably.  Even though they did seem to
> work OK at first glance I observed some corruption issues (e.g. atomic
> writes not landing in system memory) when doing atomic writes to
> contiguous (as in within the same cache-line) locations in memory.  The
> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
> mailing list some time ago exposes this IIRC, and probably a couple of
> other tests too.
Ok, I will find that case and have a try on my systems. I just tested all
the atomic related OpenCL conformance test cases without any issues.

> 
> Also this change is going to cause an instant lock-up anytime Mesa uses
> atomics because Mesa doesn't change the default L3 way allocation for
> the DC, which turns out to be 0 on HSW.

This is another issue, IMHO, if the application wants to use atomics,
it's better to allocate some L3 space for DC. Otherwise, it could
never leverage the "atomics in L3 feature". Based on my test,
the performance impact as huge as more than 10x for some workloads.

Thanks,
Zhigang Gong. 

> 
> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
> 
> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 7d99a9c..8a27802 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
> >  
> >  	ilk_init_lp_watermarks(dev);
> >  
> > -	/* L3 caching of data atomics doesn't work -- disable it. */
> > -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> > -	I915_WRITE(HSW_ROW_CHICKEN3,
> > -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> > +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
> > +		/* L3 caching of data atomics doesn't work -- disable it. */
> > +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> > +		I915_WRITE(HSW_ROW_CHICKEN3,
> > +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> > +	}
> >  
> >  	/* This is required by WaCatErrorRejectionIssue:hsw */
> >  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
> > -- 
> > 1.8.3.2




> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Francisco Jerez Jan. 5, 2015, 3:03 a.m. UTC | #2

Zhigang Gong <zhigang.gong@intel.com> writes:

> According to bspec, ROW_CHICKEN3's bit 6 which is to disable
> move of cacheable global atomics to L3 is needed for GT3 D
> stepping.
>
> I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
> The atomics works fine in beignet. And it could get more than 10x performance
> improvement with some workload, for an example, the "splat" kernel in darktable,
> without this patch, it consumes 50 seconds in one large raw picture processing.
> But with this patch, the same process only takes less than 1 second.
>

I tried this already (on HSW GT2 D as well) and I don't think it's
enough to get L3 atomics working reliably.  Even though they did seem to
work OK at first glance I observed some corruption issues (e.g. atomic
writes not landing in system memory) when doing atomic writes to
contiguous (as in within the same cache-line) locations in memory.  The
"unused" ARB_shader_image_load_store test [1] I sent to the Piglit
mailing list some time ago exposes this IIRC, and probably a couple of
other tests too.

Also this change is going to cause an instant lock-up anytime Mesa uses
atomics because Mesa doesn't change the default L3 way allocation for
the DC, which turns out to be 0 on HSW.

[1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html

> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 7d99a9c..8a27802 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>  
>  	ilk_init_lp_watermarks(dev);
>  
> -	/* L3 caching of data atomics doesn't work -- disable it. */
> -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> -	I915_WRITE(HSW_ROW_CHICKEN3,
> -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
> +		/* L3 caching of data atomics doesn't work -- disable it. */
> +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> +		I915_WRITE(HSW_ROW_CHICKEN3,
> +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> +	}
>  
>  	/* This is required by WaCatErrorRejectionIssue:hsw */
>  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
> -- 
> 1.8.3.2

Shuang He Jan. 5, 2015, 7:05 a.m. UTC | #3

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  363/364              363/364
ILK                                  364/366              364/366
SNB              +2                 443/450              445/450
IVB                                  496/498              496/498
BYT                                  288/289              288/289
HSW              +3-1              542/564              544/564
BDW                                  415/417              415/417
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*SNB  igt_kms_flip_modeset-vs-vblank-race      DMESG_WARN(3, M35)      PASS(1, M35)
 SNB  igt_kms_plane_plane-position-hole-pipe-B-plane-1      DMESG_WARN(1, M35)PASS(3, M35M22)      PASS(1, M35)
*HSW  igt_kms_flip_dpms-vs-vblank-race-interruptible      DMESG_WARN(2, M40)      PASS(1, M40)
 HSW  igt_kms_flip_single-buffer-flip-vs-dpms-off-vs-modeset      DMESG_WARN(1, M40)PASS(3, M40M19)      DMESG_WARN(1, M40)
 HSW  igt_kms_plane_plane-panning-bottom-right-pipe-C-plane-1      TIMEOUT(2, M40)PASS(2, M19M40)      PASS(1, M40)
 HSW  igt_pm_rpm_modeset-non-lpsp-stress-no-wait      NSPT(1, M19)DMESG_WARN(1, M40)PASS(2, M40)      PASS(1, M40)
Note: You need to pay more attention to line start with '*'

Francisco Jerez Jan. 5, 2015, 5:27 p.m. UTC | #4

Zhigang Gong <zhigang.gong@linux.intel.com> writes:

> On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
>> Zhigang Gong <zhigang.gong@intel.com> writes:
>> 
>> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
>> > move of cacheable global atomics to L3 is needed for GT3 D
>> > stepping.
>> >
>> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
>> > The atomics works fine in beignet. And it could get more than 10x performance
>> > improvement with some workload, for an example, the "splat" kernel in darktable,
>> > without this patch, it consumes 50 seconds in one large raw picture processing.
>> > But with this patch, the same process only takes less than 1 second.
>> >
>> 
>> I tried this already (on HSW GT2 D as well) and I don't think it's
>> enough to get L3 atomics working reliably.  Even though they did seem to
>> work OK at first glance I observed some corruption issues (e.g. atomic
>> writes not landing in system memory) when doing atomic writes to
>> contiguous (as in within the same cache-line) locations in memory.  The
>> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
>> mailing list some time ago exposes this IIRC, and probably a couple of
>> other tests too.
> Ok, I will find that case and have a try on my systems. I just tested all
> the atomic related OpenCL conformance test cases without any issues.
>
>> 
>> Also this change is going to cause an instant lock-up anytime Mesa uses
>> atomics because Mesa doesn't change the default L3 way allocation for
>> the DC, which turns out to be 0 on HSW.
>
> This is another issue, IMHO, if the application wants to use atomics,
> it's better to allocate some L3 space for DC. Otherwise, it could
> never leverage the "atomics in L3 feature". Based on my test,
> the performance impact as huge as more than 10x for some workloads.
>
Sure, but this change alone will cause a regression (an irrecoverable
system hang) with current releases of Mesa. 

I agree that Mesa should be fixed eventually to assign L3 space to the
DC for some workloads, but it doesn't seem like we have a satisfactory
API to do that right now -- The current mechanism used by Beignet
(poking the L3 control registers from the batch buffer) is slightly
concerning from a security point of view because it allows an arbitrary
userspace application to cause misrendering or impact the performance of
other clients (since the L3 config registers are not being saved and
restored as part of the context) and even crash the whole system.  It
also doesn't look like it could be made to work with the hardware
command checker.

> Thanks,
> Zhigang Gong. 
>
>> 
>> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
>> 
>> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>> >  1 file changed, 6 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> > index 7d99a9c..8a27802 100644
>> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>> >  
>> >  	ilk_init_lp_watermarks(dev);
>> >  
>> > -	/* L3 caching of data atomics doesn't work -- disable it. */
>> > -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> > -	I915_WRITE(HSW_ROW_CHICKEN3,
>> > -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> > +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
>> > +		/* L3 caching of data atomics doesn't work -- disable it. */
>> > +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> > +		I915_WRITE(HSW_ROW_CHICKEN3,
>> > +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> > +	}
>> >  
>> >  	/* This is required by WaCatErrorRejectionIssue:hsw */
>> >  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
>> > -- 
>> > 1.8.3.2
>
>
>
>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

zhigang.gong@linux.intel.com Jan. 6, 2015, 8:01 a.m. UTC | #5

On Mon, Jan 05, 2015 at 07:27:26PM +0200, Francisco Jerez wrote:
> Zhigang Gong <zhigang.gong@linux.intel.com> writes:
> 
> > On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
> >> Zhigang Gong <zhigang.gong@intel.com> writes:
> >> 
> >> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
> >> > move of cacheable global atomics to L3 is needed for GT3 D
> >> > stepping.
> >> >
> >> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
> >> > The atomics works fine in beignet. And it could get more than 10x performance
> >> > improvement with some workload, for an example, the "splat" kernel in darktable,
> >> > without this patch, it consumes 50 seconds in one large raw picture processing.
> >> > But with this patch, the same process only takes less than 1 second.
> >> >
> >> 
> >> I tried this already (on HSW GT2 D as well) and I don't think it's
> >> enough to get L3 atomics working reliably.  Even though they did seem to
> >> work OK at first glance I observed some corruption issues (e.g. atomic
> >> writes not landing in system memory) when doing atomic writes to
> >> contiguous (as in within the same cache-line) locations in memory.  The
> >> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
> >> mailing list some time ago exposes this IIRC, and probably a couple of
> >> other tests too.
> > Ok, I will find that case and have a try on my systems. I just tested all
> > the atomic related OpenCL conformance test cases without any issues.

I just found the patchset hasn't been accepted by piglit. I took a look at
the source code. And realize that that shader will not work correctly if
mesa doesn't allocate some DC in L3 space. Don't know whether you already
allocated some DC when you did that test.

> >
> >> 
> >> Also this change is going to cause an instant lock-up anytime Mesa uses
> >> atomics because Mesa doesn't change the default L3 way allocation for
> >> the DC, which turns out to be 0 on HSW.
> >
> > This is another issue, IMHO, if the application wants to use atomics,
> > it's better to allocate some L3 space for DC. Otherwise, it could
> > never leverage the "atomics in L3 feature". Based on my test,
> > the performance impact as huge as more than 10x for some workloads.
> >
> Sure, but this change alone will cause a regression (an irrecoverable
> system hang) with current releases of Mesa. 
> 
> I agree that Mesa should be fixed eventually to assign L3 space to the
> DC for some workloads, but it doesn't seem like we have a satisfactory
> API to do that right now -- The current mechanism used by Beignet
> (poking the L3 control registers from the batch buffer) is slightly
> concerning from a security point of view because it allows an arbitrary
> userspace application to cause misrendering or impact the performance of
I agree with you that the L3 configuration is very dangerous. It's better
to do more checking in the kernel space and make sure the user space
application will not crash the system via L3 configuration.
And, if the kernel provide new API we will use it in beignet ASAP in beignet.

> other clients (since the L3 config registers are not being saved and
> restored as part of the context) and even crash the whole system.  It
I found the L3 config registers are part of the IVB context, but not part
of HSW's. So if two user applications want to use different L3 config,
the user application need to do the config registers' store/restore manually.
Is there a way to do this type of thing gracefully in KMD? The new API
may need to consider the safely context switching regards to L3 config change.

> also doesn't look like it could be made to work with the hardware
> command checker.

As to the hardware command checker:
I just found with current nightly kernel, if we boot the kernel with
"i915.enable_ppgtt=2", then we can configure the L3 related registers
in user space. But the command checker still don't allow user space to
chagne L3 config by default.

Thanks,
Zhigang Gong.

> 
> > Thanks,
> > Zhigang Gong. 
> >
> >> 
> >> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
> >> 
> >> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
> >> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> >> > index 7d99a9c..8a27802 100644
> >> > --- a/drivers/gpu/drm/i915/intel_pm.c
> >> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> >> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
> >> >  
> >> >  	ilk_init_lp_watermarks(dev);
> >> >  
> >> > -	/* L3 caching of data atomics doesn't work -- disable it. */
> >> > -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> >> > -	I915_WRITE(HSW_ROW_CHICKEN3,
> >> > -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> >> > +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
> >> > +		/* L3 caching of data atomics doesn't work -- disable it. */
> >> > +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> >> > +		I915_WRITE(HSW_ROW_CHICKEN3,
> >> > +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> >> > +	}
> >> >  
> >> >  	/* This is required by WaCatErrorRejectionIssue:hsw */
> >> >  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
> >> > -- 
> >> > 1.8.3.2
> >
> >
> >
> >
> >> _______________________________________________
> >> Intel-gfx mailing list
> >> Intel-gfx@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Francisco Jerez Jan. 6, 2015, 2:19 p.m. UTC | #6

Zhigang Gong <zhigang.gong@linux.intel.com> writes:

> On Mon, Jan 05, 2015 at 07:27:26PM +0200, Francisco Jerez wrote:
>> Zhigang Gong <zhigang.gong@linux.intel.com> writes:
>> 
>> > On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
>> >> Zhigang Gong <zhigang.gong@intel.com> writes:
>> >> 
>> >> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
>> >> > move of cacheable global atomics to L3 is needed for GT3 D
>> >> > stepping.
>> >> >
>> >> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
>> >> > The atomics works fine in beignet. And it could get more than 10x performance
>> >> > improvement with some workload, for an example, the "splat" kernel in darktable,
>> >> > without this patch, it consumes 50 seconds in one large raw picture processing.
>> >> > But with this patch, the same process only takes less than 1 second.
>> >> >
>> >> 
>> >> I tried this already (on HSW GT2 D as well) and I don't think it's
>> >> enough to get L3 atomics working reliably.  Even though they did seem to
>> >> work OK at first glance I observed some corruption issues (e.g. atomic
>> >> writes not landing in system memory) when doing atomic writes to
>> >> contiguous (as in within the same cache-line) locations in memory.  The
>> >> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
>> >> mailing list some time ago exposes this IIRC, and probably a couple of
>> >> other tests too.
>> > Ok, I will find that case and have a try on my systems. I just tested all
>> > the atomic related OpenCL conformance test cases without any issues.
>
> I just found the patchset hasn't been accepted by piglit. I took a look at
> the source code. And realize that that shader will not work correctly if
> mesa doesn't allocate some DC in L3 space. Don't know whether you already
> allocated some DC when you did that test.
>

Of course, and it still fails no matter how you allocate the L3.

>> >
>> >> 
>> >> Also this change is going to cause an instant lock-up anytime Mesa uses
>> >> atomics because Mesa doesn't change the default L3 way allocation for
>> >> the DC, which turns out to be 0 on HSW.
>> >
>> > This is another issue, IMHO, if the application wants to use atomics,
>> > it's better to allocate some L3 space for DC. Otherwise, it could
>> > never leverage the "atomics in L3 feature". Based on my test,
>> > the performance impact as huge as more than 10x for some workloads.
>> >
>> Sure, but this change alone will cause a regression (an irrecoverable
>> system hang) with current releases of Mesa. 
>> 
>> I agree that Mesa should be fixed eventually to assign L3 space to the
>> DC for some workloads, but it doesn't seem like we have a satisfactory
>> API to do that right now -- The current mechanism used by Beignet
>> (poking the L3 control registers from the batch buffer) is slightly
>> concerning from a security point of view because it allows an arbitrary
>> userspace application to cause misrendering or impact the performance of
> I agree with you that the L3 configuration is very dangerous. It's better
> to do more checking in the kernel space and make sure the user space
> application will not crash the system via L3 configuration.
> And, if the kernel provide new API we will use it in beignet ASAP in beignet.
>
>> other clients (since the L3 config registers are not being saved and
>> restored as part of the context) and even crash the whole system.  It
> I found the L3 config registers are part of the IVB context, but not part
> of HSW's. So if two user applications want to use different L3 config,
> the user application need to do the config registers' store/restore manually.
> Is there a way to do this type of thing gracefully in KMD? The new API
> may need to consider the safely context switching regards to L3 config change.
>
Yeah, relying on userspace saving and restoring the L3 config values
sounds rather fragile to me, and may not work if e.g. a context hangs
and gets kicked out.  IMHO it would be preferrable to keep track of the
L3 config and save/restore it from the kernel on context switch.

>> also doesn't look like it could be made to work with the hardware
>> command checker.
>
> As to the hardware command checker:
> I just found with current nightly kernel, if we boot the kernel with
> "i915.enable_ppgtt=2", then we can configure the L3 related registers
> in user space. But the command checker still don't allow user space to
> chagne L3 config by default.
>

Ugh, apparently that's because the hardware command checker is not
enabled at all in that case...

> Thanks,
> Zhigang Gong.
>
>> 
>> > Thanks,
>> > Zhigang Gong. 
>> >
>> >> 
>> >> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
>> >> 
>> >> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
>> >> > ---
>> >> >  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>> >> >  1 file changed, 6 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> >> > index 7d99a9c..8a27802 100644
>> >> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> >> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> >> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>> >> >  
>> >> >  	ilk_init_lp_watermarks(dev);
>> >> >  
>> >> > -	/* L3 caching of data atomics doesn't work -- disable it. */
>> >> > -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > -	I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
>> >> > +		/* L3 caching of data atomics doesn't work -- disable it. */
>> >> > +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > +		I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > +	}
>> >> >  
>> >> >  	/* This is required by WaCatErrorRejectionIssue:hsw */
>> >> >  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
>> >> > -- 
>> >> > 1.8.3.2
>> >
>> >
>> >
>> >
>> >> _______________________________________________
>> >> Intel-gfx mailing list
>> >> Intel-gfx@lists.freedesktop.org
>> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

drm/i915/hsw: enable atomic in L3 for some steppings.

Commit Message

Comments

Patch