diff mbox

drm/i915: Expose LLC size to user space

Message ID 1373425083-1276-1-git-send-email-ben@bwidawsk.net (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky July 10, 2013, 2:58 a.m. UTC
The algorithm/information was originally written by Chad, though I
changed the control flow, and I think his original code had a couple of
bugs, though I didn't look very hard before rewriting. That could have
also been different interpretations of the spec.

The excellent comments remain entirely copied from Chad's code.

I've tested this on two platforms, and it seems to perform how I want.

CC: Chad Versace <chad.versace@linux.intel.com>
CC: Bryan Bell <bryan.j.bell@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_dma.c |  2 +-
 drivers/gpu/drm/i915/i915_drv.h |  2 ++
 drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 1 deletion(-)

Comments

Chris Wilson July 10, 2013, 8:59 a.m. UTC | #1
On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> The algorithm/information was originally written by Chad, though I
> changed the control flow, and I think his original code had a couple of
> bugs, though I didn't look very hard before rewriting. That could have
> also been different interpretations of the spec.

Just a cpuid query that can already be done more simply from userspace
(i.e. with no syscalls)? I was expecting a lot more magic.
-Chris
Ben Widawsky July 10, 2013, 4:40 p.m. UTC | #2
On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > The algorithm/information was originally written by Chad, though I
> > changed the control flow, and I think his original code had a couple of
> > bugs, though I didn't look very hard before rewriting. That could have
> > also been different interpretations of the spec.
> 
> Just a cpuid query that can already be done more simply from userspace
> (i.e. with no syscalls)? I was expecting a lot more magic.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

Chad wrote this originally for mesa. And yes, it's doable from
userspace. Chatting with Daniel, we thought maybe other GPU clients
might want this info, and so a central place to put the code would be
nice, in case there are quirks or anything like that (I've had a
particularly hard time figuring out if Xeon really has L3 or not).

So what's a central place? Libdrm, everybody uses that right? You can
read /proc/cpuinfo as far as I know, but then you still need to query
the HAS_LLC getparam to figure out what kind of L3 (or do your own
chipset ID check).

In addition to the centrality argument, I noticed while poking around
cpuid in the kernel that it is a vitalized function. I'm not sure what
the purpose of this is, but it's something you can't fake in userspace.
Ben Widawsky July 10, 2013, 5:11 p.m. UTC | #3
On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote:
> On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > > The algorithm/information was originally written by Chad, though I
> > > changed the control flow, and I think his original code had a couple of
> > > bugs, though I didn't look very hard before rewriting. That could have
> > > also been different interpretations of the spec.
> > 
> > Just a cpuid query that can already be done more simply from userspace
> > (i.e. with no syscalls)? I was expecting a lot more magic.
> > -Chris
> > 
> > -- 
> > Chris Wilson, Intel Open Source Technology Centre
> 
> Chad wrote this originally for mesa. And yes, it's doable from
> userspace. Chatting with Daniel, we thought maybe other GPU clients
> might want this info, and so a central place to put the code would be
> nice, in case there are quirks or anything like that (I've had a
> particularly hard time figuring out if Xeon really has L3 or not).
> 
> So what's a central place? Libdrm, everybody uses that right? You can
> read /proc/cpuinfo as far as I know, but then you still need to query
> the HAS_LLC getparam to figure out what kind of L3 (or do your own
> chipset ID check).
> 
> In addition to the centrality argument, I noticed while poking around
> cpuid in the kernel that it is a vitalized function. I'm not sure what
That's the last time I let vim tell me I spelled "virtualized" wrong:

cpuid in the kernel that it is a virtualized function
>
> the purpose of this is, but it's something you can't fake in userspace.
> 
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 10, 2013, 5:40 p.m. UTC | #4
On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote:
> On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > > The algorithm/information was originally written by Chad, though I
> > > changed the control flow, and I think his original code had a couple of
> > > bugs, though I didn't look very hard before rewriting. That could have
> > > also been different interpretations of the spec.
> > 
> > Just a cpuid query that can already be done more simply from userspace
> > (i.e. with no syscalls)? I was expecting a lot more magic.
> > -Chris
> > 
> > -- 
> > Chris Wilson, Intel Open Source Technology Centre
> 
> Chad wrote this originally for mesa. And yes, it's doable from
> userspace. Chatting with Daniel, we thought maybe other GPU clients
> might want this info, and so a central place to put the code would be
> nice, in case there are quirks or anything like that (I've had a
> particularly hard time figuring out if Xeon really has L3 or not).
> 
> So what's a central place? Libdrm, everybody uses that right? You can
> read /proc/cpuinfo as far as I know, but then you still need to query
> the HAS_LLC getparam to figure out what kind of L3 (or do your own
> chipset ID check).
> 
> In addition to the centrality argument, I noticed while poking around
> cpuid in the kernel that it is a vitalized function. I'm not sure what
> the purpose of this is, but it's something you can't fake in userspace.

In fact, isn't this functionality already part of arch/x86, and isn't the
value already exposed via /proc/cpuinfo?
-Chris
Ben Widawsky July 10, 2013, 5:46 p.m. UTC | #5
On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote:
> On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote:
> > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > > > The algorithm/information was originally written by Chad, though I
> > > > changed the control flow, and I think his original code had a couple of
> > > > bugs, though I didn't look very hard before rewriting. That could have
> > > > also been different interpretations of the spec.
> > > 
> > > Just a cpuid query that can already be done more simply from userspace
> > > (i.e. with no syscalls)? I was expecting a lot more magic.
> > > -Chris
> > > 
> > > -- 
> > > Chris Wilson, Intel Open Source Technology Centre
> > 
> > Chad wrote this originally for mesa. And yes, it's doable from
> > userspace. Chatting with Daniel, we thought maybe other GPU clients
> > might want this info, and so a central place to put the code would be
> > nice, in case there are quirks or anything like that (I've had a
> > particularly hard time figuring out if Xeon really has L3 or not).
> > 
> > So what's a central place? Libdrm, everybody uses that right? You can
> > read /proc/cpuinfo as far as I know, but then you still need to query
> > the HAS_LLC getparam to figure out what kind of L3 (or do your own
> > chipset ID check).
> > 
> > In addition to the centrality argument, I noticed while poking around
> > cpuid in the kernel that it is a vitalized function. I'm not sure what
> > the purpose of this is, but it's something you can't fake in userspace.
> 
> In fact, isn't this functionality already part of arch/x86, and isn't the
> value already exposed via /proc/cpuinfo?
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
>
Yes to the /proc/cpuinfo bit (I said that). If the implication is just
to use that code, I did a quick search and couldn't find it. I can go
back and check again. Assuming it's externally exposed to drivers, I
would rather use that.

We still need the HAS_LLC check to differentiate the L3 though.
Chris Wilson July 10, 2013, 6 p.m. UTC | #6
On Wed, Jul 10, 2013 at 10:46:47AM -0700, Ben Widawsky wrote:
> On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote:
> > On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote:
> > > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> > > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > > > > The algorithm/information was originally written by Chad, though I
> > > > > changed the control flow, and I think his original code had a couple of
> > > > > bugs, though I didn't look very hard before rewriting. That could have
> > > > > also been different interpretations of the spec.
> > > > 
> > > > Just a cpuid query that can already be done more simply from userspace
> > > > (i.e. with no syscalls)? I was expecting a lot more magic.
> > > > -Chris
> > > > 
> > > > -- 
> > > > Chris Wilson, Intel Open Source Technology Centre
> > > 
> > > Chad wrote this originally for mesa. And yes, it's doable from
> > > userspace. Chatting with Daniel, we thought maybe other GPU clients
> > > might want this info, and so a central place to put the code would be
> > > nice, in case there are quirks or anything like that (I've had a
> > > particularly hard time figuring out if Xeon really has L3 or not).
> > > 
> > > So what's a central place? Libdrm, everybody uses that right? You can
> > > read /proc/cpuinfo as far as I know, but then you still need to query
> > > the HAS_LLC getparam to figure out what kind of L3 (or do your own
> > > chipset ID check).
> > > 
> > > In addition to the centrality argument, I noticed while poking around
> > > cpuid in the kernel that it is a vitalized function. I'm not sure what
> > > the purpose of this is, but it's something you can't fake in userspace.
> > 
> > In fact, isn't this functionality already part of arch/x86, and isn't the
> > value already exposed via /proc/cpuinfo?
> > -Chris
> > 
> > -- 
> > Chris Wilson, Intel Open Source Technology Centre
> >
> Yes to the /proc/cpuinfo bit (I said that). If the implication is just
> to use that code, I did a quick search and couldn't find it. I can go
> back and check again. Assuming it's externally exposed to drivers, I
> would rather use that.
> 
> We still need the HAS_LLC check to differentiate the L3 though.

You have to be careful about repurposing PARAM_HAS_LLC though. At the
moment it has a dual meaning that the kernel/gpu supports LLC and that
we default to placing objects in LLC.

If you must expose cache_size through a param, make it a new one and
make it explicit. I would also rather just use a param than parse
/proc/cpuinfo (especially as some paranoid setups prevent access to
/proc/cpuinfo).

But we really should not be calling cpuid from our driver, that just
looks very very ugly, and given cpuid's history will be a maintenance
burden.
-Chris
Ben Widawsky July 10, 2013, 6:44 p.m. UTC | #7
On Wed, Jul 10, 2013 at 07:00:53PM +0100, Chris Wilson wrote:
> On Wed, Jul 10, 2013 at 10:46:47AM -0700, Ben Widawsky wrote:
> > On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote:
> > > On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote:
> > > > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote:
> > > > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote:
> > > > > > The algorithm/information was originally written by Chad, though I
> > > > > > changed the control flow, and I think his original code had a couple of
> > > > > > bugs, though I didn't look very hard before rewriting. That could have
> > > > > > also been different interpretations of the spec.
> > > > > 
> > > > > Just a cpuid query that can already be done more simply from userspace
> > > > > (i.e. with no syscalls)? I was expecting a lot more magic.
> > > > > -Chris
> > > > > 
> > > > > -- 
> > > > > Chris Wilson, Intel Open Source Technology Centre
> > > > 
> > > > Chad wrote this originally for mesa. And yes, it's doable from
> > > > userspace. Chatting with Daniel, we thought maybe other GPU clients
> > > > might want this info, and so a central place to put the code would be
> > > > nice, in case there are quirks or anything like that (I've had a
> > > > particularly hard time figuring out if Xeon really has L3 or not).
> > > > 
> > > > So what's a central place? Libdrm, everybody uses that right? You can
> > > > read /proc/cpuinfo as far as I know, but then you still need to query
> > > > the HAS_LLC getparam to figure out what kind of L3 (or do your own
> > > > chipset ID check).
> > > > 
> > > > In addition to the centrality argument, I noticed while poking around
> > > > cpuid in the kernel that it is a vitalized function. I'm not sure what
> > > > the purpose of this is, but it's something you can't fake in userspace.
> > > 
> > > In fact, isn't this functionality already part of arch/x86, and isn't the
> > > value already exposed via /proc/cpuinfo?
> > > -Chris
> > > 
> > > -- 
> > > Chris Wilson, Intel Open Source Technology Centre
> > >
> > Yes to the /proc/cpuinfo bit (I said that). If the implication is just
> > to use that code, I did a quick search and couldn't find it. I can go
> > back and check again. Assuming it's externally exposed to drivers, I
> > would rather use that.
> > 
> > We still need the HAS_LLC check to differentiate the L3 though.
> 
> You have to be careful about repurposing PARAM_HAS_LLC though. At the
> moment it has a dual meaning that the kernel/gpu supports LLC and that
> we default to placing objects in LLC.
> 
> If you must expose cache_size through a param, make it a new one and
> make it explicit. I would also rather just use a param than parse
> /proc/cpuinfo (especially as some paranoid setups prevent access to
> /proc/cpuinfo).

I did a search, and I think my change is backward compatible, but if you
want a new param, I don't care.

> 
> But we really should not be calling cpuid from our driver, that just
> looks very very ugly, and given cpuid's history will be a maintenance
> burden.
> -Chris

Okay, let me try again to find where it is parsed by the arch/x86 code.

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chad Versace July 11, 2013, 12:16 a.m. UTC | #8
On 07/09/2013 07:58 PM, Ben Widawsky wrote:
> The algorithm/information was originally written by Chad, though I
> changed the control flow, and I think his original code had a couple of
> bugs, though I didn't look very hard before rewriting. That could have
> also been different interpretations of the spec.
>
> The excellent comments remain entirely copied from Chad's code.
>
> I've tested this on two platforms, and it seems to perform how I want.
>
> CC: Chad Versace <chad.versace@linux.intel.com>
> CC: Bryan Bell <bryan.j.bell@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>   drivers/gpu/drm/i915/i915_dma.c |  2 +-
>   drivers/gpu/drm/i915/i915_drv.h |  2 ++
>   drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 0e22142..377949e 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -974,7 +974,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   		value = 1;
>   		break;
>   	case I915_PARAM_HAS_LLC:
> -		value = HAS_LLC(dev);
> +		value = dev_priv->llc_size;
>   		break;
>   	case I915_PARAM_HAS_ALIASING_PPGTT:
>   		value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;

I would like to see a dedicated param for the llc size. 'has' is always
boolean valued, right?

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c8d6104..43a549d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1187,6 +1187,8 @@ typedef struct drm_i915_private {
>   	/* Old dri1 support infrastructure, beware the dragons ya fools entering
>   	 * here! */
>   	struct i915_dri1_state dri1;
> +
> +	size_t llc_size;
>   } drm_i915_private_t;
>
>   /* Iterate over initialised rings */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index af61be8..a070686 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4282,6 +4282,57 @@ i915_gem_lastclose(struct drm_device *dev)
>   		DRM_ERROR("failed to idle hardware: %d\n", ret);
>   }
>
> +/**
> + * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3
> + * cache, or if an error occurs in obtaining the cache size, then return 0.
> + * From "Intel Processor Identification and the CPUID Instruction > 5.15
> + * Deterministic Cache Parmaeters (Function 04h)":
> + *    When EAX is initialized to a value of 4, the CPUID instruction returns
> + *    deterministic cache information in the EAX, EBX, ECX and EDX registers.
> + *    This function requires ECX be initialized with an index which indicates
> + *    which cache to return information about. The OS is expected to call this
> + *    function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no
> + *    more caches. The order in which the caches are returned is not specified
> + *    and may change at Intel's discretion.
> + *
> + * Equation 5-4. Calculating the Cache Size in bytes:
> + *          = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1)
> + *          = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1)
> + */
> +static size_t get_llc_size(struct drm_device *dev)
> +{
> +	u8 cnt = 0;
> +	unsigned int eax, ebx, ecx, edx;
> +
> +	if (!HAS_LLC(dev))
> +		return 0;
> +
> +	do {
> +		uint32_t cache_level;
> +		uint32_t associativity, line_partitions, line_size, sets;
> +
> +		eax = 4;
> +		ecx = cnt;
> +		__cpuid(&eax, &ebx, &ecx, &edx);
> +
> +		cache_level = (eax >> 5) & 0x7;
> +		if (cache_level != 3)
> +			continue;
> +
> +		associativity = ((ebx >> 22) & 0x3ff) + 1;
> +		line_partitions = ((ebx >> 12) & 0x3ff) + 1;
> +		line_size = (ebx & 0xfff) + 1;
> +		sets = ecx + 1;
> +
> +		return associativity * line_partitions * line_size * sets;
> +	} while (eax & 0x1f && ++cnt);
> +
> +	/* Let user space know we have LLC, but we can't figure it out */
> +	DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n");
> +	return 1;
> +}

This function looks good to me, since I wrote most of it ;)
Ben Widawsky July 11, 2013, 5:06 a.m. UTC | #9
On Wed, Jul 10, 2013 at 05:16:54PM -0700, Chad Versace wrote:
> On 07/09/2013 07:58 PM, Ben Widawsky wrote:
> >The algorithm/information was originally written by Chad, though I
> >changed the control flow, and I think his original code had a couple of
> >bugs, though I didn't look very hard before rewriting. That could have
> >also been different interpretations of the spec.
> >
> >The excellent comments remain entirely copied from Chad's code.
> >
> >I've tested this on two platforms, and it seems to perform how I want.
> >
> >CC: Chad Versace <chad.versace@linux.intel.com>
> >CC: Bryan Bell <bryan.j.bell@intel.com>
> >Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> >---
> >  drivers/gpu/drm/i915/i915_dma.c |  2 +-
> >  drivers/gpu/drm/i915/i915_drv.h |  2 ++
> >  drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 56 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> >index 0e22142..377949e 100644
> >--- a/drivers/gpu/drm/i915/i915_dma.c
> >+++ b/drivers/gpu/drm/i915/i915_dma.c
> >@@ -974,7 +974,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >  		value = 1;
> >  		break;
> >  	case I915_PARAM_HAS_LLC:
> >-		value = HAS_LLC(dev);
> >+		value = dev_priv->llc_size;
> >  		break;
> >  	case I915_PARAM_HAS_ALIASING_PPGTT:
> >  		value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;
> 
> I would like to see a dedicated param for the llc size. 'has' is always
> boolean valued, right?

Sounds like Chris wants the same anyway, but no. Getparam is always an
int. Just the name is, 'has'

> 
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index c8d6104..43a549d 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -1187,6 +1187,8 @@ typedef struct drm_i915_private {
> >  	/* Old dri1 support infrastructure, beware the dragons ya fools entering
> >  	 * here! */
> >  	struct i915_dri1_state dri1;
> >+
> >+	size_t llc_size;
> >  } drm_i915_private_t;
> >
> >  /* Iterate over initialised rings */
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index af61be8..a070686 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -4282,6 +4282,57 @@ i915_gem_lastclose(struct drm_device *dev)
> >  		DRM_ERROR("failed to idle hardware: %d\n", ret);
> >  }
> >
> >+/**
> >+ * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3
> >+ * cache, or if an error occurs in obtaining the cache size, then return 0.
> >+ * From "Intel Processor Identification and the CPUID Instruction > 5.15
> >+ * Deterministic Cache Parmaeters (Function 04h)":
> >+ *    When EAX is initialized to a value of 4, the CPUID instruction returns
> >+ *    deterministic cache information in the EAX, EBX, ECX and EDX registers.
> >+ *    This function requires ECX be initialized with an index which indicates
> >+ *    which cache to return information about. The OS is expected to call this
> >+ *    function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no
> >+ *    more caches. The order in which the caches are returned is not specified
> >+ *    and may change at Intel's discretion.
> >+ *
> >+ * Equation 5-4. Calculating the Cache Size in bytes:
> >+ *          = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1)
> >+ *          = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1)
> >+ */
> >+static size_t get_llc_size(struct drm_device *dev)
> >+{
> >+	u8 cnt = 0;
> >+	unsigned int eax, ebx, ecx, edx;
> >+
> >+	if (!HAS_LLC(dev))
> >+		return 0;
> >+
> >+	do {
> >+		uint32_t cache_level;
> >+		uint32_t associativity, line_partitions, line_size, sets;
> >+
> >+		eax = 4;
> >+		ecx = cnt;
> >+		__cpuid(&eax, &ebx, &ecx, &edx);
> >+
> >+		cache_level = (eax >> 5) & 0x7;
> >+		if (cache_level != 3)
> >+			continue;
> >+
> >+		associativity = ((ebx >> 22) & 0x3ff) + 1;
> >+		line_partitions = ((ebx >> 12) & 0x3ff) + 1;
> >+		line_size = (ebx & 0xfff) + 1;
> >+		sets = ecx + 1;
> >+
> >+		return associativity * line_partitions * line_size * sets;
> >+	} while (eax & 0x1f && ++cnt);
> >+
> >+	/* Let user space know we have LLC, but we can't figure it out */
> >+	DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n");
> >+	return 1;
> >+}
> 
> This function looks good to me, since I wrote most of it ;)
> 

Still arguing over whether or not to try to pull it from the x86 core. I
tried to get some comment from hpa today , but haven't heard anything
yet.

I do agree if we can just take the code from the x86 core it would be
best, but AFAICT I'd need to change some code, or do an almost equal
amount of logic to determine l2 vs. l3. Thinking about it now though, I
suppose I can assume if HAS_LLC, then whatever cache size was obtained
during boot is l3... dunno.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 0e22142..377949e 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -974,7 +974,7 @@  static int i915_getparam(struct drm_device *dev, void *data,
 		value = 1;
 		break;
 	case I915_PARAM_HAS_LLC:
-		value = HAS_LLC(dev);
+		value = dev_priv->llc_size;
 		break;
 	case I915_PARAM_HAS_ALIASING_PPGTT:
 		value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c8d6104..43a549d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1187,6 +1187,8 @@  typedef struct drm_i915_private {
 	/* Old dri1 support infrastructure, beware the dragons ya fools entering
 	 * here! */
 	struct i915_dri1_state dri1;
+
+	size_t llc_size;
 } drm_i915_private_t;
 
 /* Iterate over initialised rings */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index af61be8..a070686 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4282,6 +4282,57 @@  i915_gem_lastclose(struct drm_device *dev)
 		DRM_ERROR("failed to idle hardware: %d\n", ret);
 }
 
+/**
+ * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3
+ * cache, or if an error occurs in obtaining the cache size, then return 0.
+ * From "Intel Processor Identification and the CPUID Instruction > 5.15
+ * Deterministic Cache Parmaeters (Function 04h)":
+ *    When EAX is initialized to a value of 4, the CPUID instruction returns
+ *    deterministic cache information in the EAX, EBX, ECX and EDX registers.
+ *    This function requires ECX be initialized with an index which indicates
+ *    which cache to return information about. The OS is expected to call this
+ *    function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no
+ *    more caches. The order in which the caches are returned is not specified
+ *    and may change at Intel's discretion.
+ *
+ * Equation 5-4. Calculating the Cache Size in bytes:
+ *          = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1)
+ *          = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1)
+ */
+static size_t get_llc_size(struct drm_device *dev)
+{
+	u8 cnt = 0;
+	unsigned int eax, ebx, ecx, edx;
+
+	if (!HAS_LLC(dev))
+		return 0;
+
+	do {
+		uint32_t cache_level;
+		uint32_t associativity, line_partitions, line_size, sets;
+
+		eax = 4;
+		ecx = cnt;
+		__cpuid(&eax, &ebx, &ecx, &edx);
+
+		cache_level = (eax >> 5) & 0x7;
+		if (cache_level != 3)
+			continue;
+
+		associativity = ((ebx >> 22) & 0x3ff) + 1;
+		line_partitions = ((ebx >> 12) & 0x3ff) + 1;
+		line_size = (ebx & 0xfff) + 1;
+		sets = ecx + 1;
+
+		return associativity * line_partitions * line_size * sets;
+	} while (eax & 0x1f && ++cnt);
+
+	/* Let user space know we have LLC, but we can't figure it out */
+	DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n");
+	return 1;
+}
+
+
 static void
 init_ring_lists(struct intel_ring_buffer *ring)
 {
@@ -4333,6 +4384,8 @@  i915_gem_load(struct drm_device *dev)
 	else
 		dev_priv->num_fence_regs = 8;
 
+	dev_priv->llc_size = get_llc_size(dev);
+
 	/* Initialize fence registers to zero */
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
 	i915_gem_restore_fences(dev);