Message ID | 1373425083-1276-1-git-send-email-ben@bwidawsk.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > The algorithm/information was originally written by Chad, though I > changed the control flow, and I think his original code had a couple of > bugs, though I didn't look very hard before rewriting. That could have > also been different interpretations of the spec. Just a cpuid query that can already be done more simply from userspace (i.e. with no syscalls)? I was expecting a lot more magic. -Chris
On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > The algorithm/information was originally written by Chad, though I > > changed the control flow, and I think his original code had a couple of > > bugs, though I didn't look very hard before rewriting. That could have > > also been different interpretations of the spec. > > Just a cpuid query that can already be done more simply from userspace > (i.e. with no syscalls)? I was expecting a lot more magic. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre Chad wrote this originally for mesa. And yes, it's doable from userspace. Chatting with Daniel, we thought maybe other GPU clients might want this info, and so a central place to put the code would be nice, in case there are quirks or anything like that (I've had a particularly hard time figuring out if Xeon really has L3 or not). So what's a central place? Libdrm, everybody uses that right? You can read /proc/cpuinfo as far as I know, but then you still need to query the HAS_LLC getparam to figure out what kind of L3 (or do your own chipset ID check). In addition to the centrality argument, I noticed while poking around cpuid in the kernel that it is a vitalized function. I'm not sure what the purpose of this is, but it's something you can't fake in userspace.
On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote: > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > > The algorithm/information was originally written by Chad, though I > > > changed the control flow, and I think his original code had a couple of > > > bugs, though I didn't look very hard before rewriting. That could have > > > also been different interpretations of the spec. > > > > Just a cpuid query that can already be done more simply from userspace > > (i.e. with no syscalls)? I was expecting a lot more magic. > > -Chris > > > > -- > > Chris Wilson, Intel Open Source Technology Centre > > Chad wrote this originally for mesa. And yes, it's doable from > userspace. Chatting with Daniel, we thought maybe other GPU clients > might want this info, and so a central place to put the code would be > nice, in case there are quirks or anything like that (I've had a > particularly hard time figuring out if Xeon really has L3 or not). > > So what's a central place? Libdrm, everybody uses that right? You can > read /proc/cpuinfo as far as I know, but then you still need to query > the HAS_LLC getparam to figure out what kind of L3 (or do your own > chipset ID check). > > In addition to the centrality argument, I noticed while poking around > cpuid in the kernel that it is a vitalized function. I'm not sure what That's the last time I let vim tell me I spelled "virtualized" wrong: cpuid in the kernel that it is a virtualized function > > the purpose of this is, but it's something you can't fake in userspace. > > > -- > Ben Widawsky, Intel Open Source Technology Center > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote: > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > > The algorithm/information was originally written by Chad, though I > > > changed the control flow, and I think his original code had a couple of > > > bugs, though I didn't look very hard before rewriting. That could have > > > also been different interpretations of the spec. > > > > Just a cpuid query that can already be done more simply from userspace > > (i.e. with no syscalls)? I was expecting a lot more magic. > > -Chris > > > > -- > > Chris Wilson, Intel Open Source Technology Centre > > Chad wrote this originally for mesa. And yes, it's doable from > userspace. Chatting with Daniel, we thought maybe other GPU clients > might want this info, and so a central place to put the code would be > nice, in case there are quirks or anything like that (I've had a > particularly hard time figuring out if Xeon really has L3 or not). > > So what's a central place? Libdrm, everybody uses that right? You can > read /proc/cpuinfo as far as I know, but then you still need to query > the HAS_LLC getparam to figure out what kind of L3 (or do your own > chipset ID check). > > In addition to the centrality argument, I noticed while poking around > cpuid in the kernel that it is a vitalized function. I'm not sure what > the purpose of this is, but it's something you can't fake in userspace. In fact, isn't this functionality already part of arch/x86, and isn't the value already exposed via /proc/cpuinfo? -Chris
On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote: > On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote: > > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > > > The algorithm/information was originally written by Chad, though I > > > > changed the control flow, and I think his original code had a couple of > > > > bugs, though I didn't look very hard before rewriting. That could have > > > > also been different interpretations of the spec. > > > > > > Just a cpuid query that can already be done more simply from userspace > > > (i.e. with no syscalls)? I was expecting a lot more magic. > > > -Chris > > > > > > -- > > > Chris Wilson, Intel Open Source Technology Centre > > > > Chad wrote this originally for mesa. And yes, it's doable from > > userspace. Chatting with Daniel, we thought maybe other GPU clients > > might want this info, and so a central place to put the code would be > > nice, in case there are quirks or anything like that (I've had a > > particularly hard time figuring out if Xeon really has L3 or not). > > > > So what's a central place? Libdrm, everybody uses that right? You can > > read /proc/cpuinfo as far as I know, but then you still need to query > > the HAS_LLC getparam to figure out what kind of L3 (or do your own > > chipset ID check). > > > > In addition to the centrality argument, I noticed while poking around > > cpuid in the kernel that it is a vitalized function. I'm not sure what > > the purpose of this is, but it's something you can't fake in userspace. > > In fact, isn't this functionality already part of arch/x86, and isn't the > value already exposed via /proc/cpuinfo? > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > Yes to the /proc/cpuinfo bit (I said that). If the implication is just to use that code, I did a quick search and couldn't find it. I can go back and check again. Assuming it's externally exposed to drivers, I would rather use that. We still need the HAS_LLC check to differentiate the L3 though.
On Wed, Jul 10, 2013 at 10:46:47AM -0700, Ben Widawsky wrote: > On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote: > > On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote: > > > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > > > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > > > > The algorithm/information was originally written by Chad, though I > > > > > changed the control flow, and I think his original code had a couple of > > > > > bugs, though I didn't look very hard before rewriting. That could have > > > > > also been different interpretations of the spec. > > > > > > > > Just a cpuid query that can already be done more simply from userspace > > > > (i.e. with no syscalls)? I was expecting a lot more magic. > > > > -Chris > > > > > > > > -- > > > > Chris Wilson, Intel Open Source Technology Centre > > > > > > Chad wrote this originally for mesa. And yes, it's doable from > > > userspace. Chatting with Daniel, we thought maybe other GPU clients > > > might want this info, and so a central place to put the code would be > > > nice, in case there are quirks or anything like that (I've had a > > > particularly hard time figuring out if Xeon really has L3 or not). > > > > > > So what's a central place? Libdrm, everybody uses that right? You can > > > read /proc/cpuinfo as far as I know, but then you still need to query > > > the HAS_LLC getparam to figure out what kind of L3 (or do your own > > > chipset ID check). > > > > > > In addition to the centrality argument, I noticed while poking around > > > cpuid in the kernel that it is a vitalized function. I'm not sure what > > > the purpose of this is, but it's something you can't fake in userspace. > > > > In fact, isn't this functionality already part of arch/x86, and isn't the > > value already exposed via /proc/cpuinfo? > > -Chris > > > > -- > > Chris Wilson, Intel Open Source Technology Centre > > > Yes to the /proc/cpuinfo bit (I said that). If the implication is just > to use that code, I did a quick search and couldn't find it. I can go > back and check again. Assuming it's externally exposed to drivers, I > would rather use that. > > We still need the HAS_LLC check to differentiate the L3 though. You have to be careful about repurposing PARAM_HAS_LLC though. At the moment it has a dual meaning that the kernel/gpu supports LLC and that we default to placing objects in LLC. If you must expose cache_size through a param, make it a new one and make it explicit. I would also rather just use a param than parse /proc/cpuinfo (especially as some paranoid setups prevent access to /proc/cpuinfo). But we really should not be calling cpuid from our driver, that just looks very very ugly, and given cpuid's history will be a maintenance burden. -Chris
On Wed, Jul 10, 2013 at 07:00:53PM +0100, Chris Wilson wrote: > On Wed, Jul 10, 2013 at 10:46:47AM -0700, Ben Widawsky wrote: > > On Wed, Jul 10, 2013 at 06:40:02PM +0100, Chris Wilson wrote: > > > On Wed, Jul 10, 2013 at 09:40:18AM -0700, Ben Widawsky wrote: > > > > On Wed, Jul 10, 2013 at 09:59:01AM +0100, Chris Wilson wrote: > > > > > On Tue, Jul 09, 2013 at 07:58:02PM -0700, Ben Widawsky wrote: > > > > > > The algorithm/information was originally written by Chad, though I > > > > > > changed the control flow, and I think his original code had a couple of > > > > > > bugs, though I didn't look very hard before rewriting. That could have > > > > > > also been different interpretations of the spec. > > > > > > > > > > Just a cpuid query that can already be done more simply from userspace > > > > > (i.e. with no syscalls)? I was expecting a lot more magic. > > > > > -Chris > > > > > > > > > > -- > > > > > Chris Wilson, Intel Open Source Technology Centre > > > > > > > > Chad wrote this originally for mesa. And yes, it's doable from > > > > userspace. Chatting with Daniel, we thought maybe other GPU clients > > > > might want this info, and so a central place to put the code would be > > > > nice, in case there are quirks or anything like that (I've had a > > > > particularly hard time figuring out if Xeon really has L3 or not). > > > > > > > > So what's a central place? Libdrm, everybody uses that right? You can > > > > read /proc/cpuinfo as far as I know, but then you still need to query > > > > the HAS_LLC getparam to figure out what kind of L3 (or do your own > > > > chipset ID check). > > > > > > > > In addition to the centrality argument, I noticed while poking around > > > > cpuid in the kernel that it is a vitalized function. I'm not sure what > > > > the purpose of this is, but it's something you can't fake in userspace. > > > > > > In fact, isn't this functionality already part of arch/x86, and isn't the > > > value already exposed via /proc/cpuinfo? > > > -Chris > > > > > > -- > > > Chris Wilson, Intel Open Source Technology Centre > > > > > Yes to the /proc/cpuinfo bit (I said that). If the implication is just > > to use that code, I did a quick search and couldn't find it. I can go > > back and check again. Assuming it's externally exposed to drivers, I > > would rather use that. > > > > We still need the HAS_LLC check to differentiate the L3 though. > > You have to be careful about repurposing PARAM_HAS_LLC though. At the > moment it has a dual meaning that the kernel/gpu supports LLC and that > we default to placing objects in LLC. > > If you must expose cache_size through a param, make it a new one and > make it explicit. I would also rather just use a param than parse > /proc/cpuinfo (especially as some paranoid setups prevent access to > /proc/cpuinfo). I did a search, and I think my change is backward compatible, but if you want a new param, I don't care. > > But we really should not be calling cpuid from our driver, that just > looks very very ugly, and given cpuid's history will be a maintenance > burden. > -Chris Okay, let me try again to find where it is parsed by the arch/x86 code. > > -- > Chris Wilson, Intel Open Source Technology Centre
On 07/09/2013 07:58 PM, Ben Widawsky wrote: > The algorithm/information was originally written by Chad, though I > changed the control flow, and I think his original code had a couple of > bugs, though I didn't look very hard before rewriting. That could have > also been different interpretations of the spec. > > The excellent comments remain entirely copied from Chad's code. > > I've tested this on two platforms, and it seems to perform how I want. > > CC: Chad Versace <chad.versace@linux.intel.com> > CC: Bryan Bell <bryan.j.bell@intel.com> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_dma.c | 2 +- > drivers/gpu/drm/i915/i915_drv.h | 2 ++ > drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index 0e22142..377949e 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -974,7 +974,7 @@ static int i915_getparam(struct drm_device *dev, void *data, > value = 1; > break; > case I915_PARAM_HAS_LLC: > - value = HAS_LLC(dev); > + value = dev_priv->llc_size; > break; > case I915_PARAM_HAS_ALIASING_PPGTT: > value = dev_priv->mm.aliasing_ppgtt ? 1 : 0; I would like to see a dedicated param for the llc size. 'has' is always boolean valued, right? > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index c8d6104..43a549d 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1187,6 +1187,8 @@ typedef struct drm_i915_private { > /* Old dri1 support infrastructure, beware the dragons ya fools entering > * here! */ > struct i915_dri1_state dri1; > + > + size_t llc_size; > } drm_i915_private_t; > > /* Iterate over initialised rings */ > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index af61be8..a070686 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4282,6 +4282,57 @@ i915_gem_lastclose(struct drm_device *dev) > DRM_ERROR("failed to idle hardware: %d\n", ret); > } > > +/** > + * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3 > + * cache, or if an error occurs in obtaining the cache size, then return 0. > + * From "Intel Processor Identification and the CPUID Instruction > 5.15 > + * Deterministic Cache Parmaeters (Function 04h)": > + * When EAX is initialized to a value of 4, the CPUID instruction returns > + * deterministic cache information in the EAX, EBX, ECX and EDX registers. > + * This function requires ECX be initialized with an index which indicates > + * which cache to return information about. The OS is expected to call this > + * function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no > + * more caches. The order in which the caches are returned is not specified > + * and may change at Intel's discretion. > + * > + * Equation 5-4. Calculating the Cache Size in bytes: > + * = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1) > + * = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1) > + */ > +static size_t get_llc_size(struct drm_device *dev) > +{ > + u8 cnt = 0; > + unsigned int eax, ebx, ecx, edx; > + > + if (!HAS_LLC(dev)) > + return 0; > + > + do { > + uint32_t cache_level; > + uint32_t associativity, line_partitions, line_size, sets; > + > + eax = 4; > + ecx = cnt; > + __cpuid(&eax, &ebx, &ecx, &edx); > + > + cache_level = (eax >> 5) & 0x7; > + if (cache_level != 3) > + continue; > + > + associativity = ((ebx >> 22) & 0x3ff) + 1; > + line_partitions = ((ebx >> 12) & 0x3ff) + 1; > + line_size = (ebx & 0xfff) + 1; > + sets = ecx + 1; > + > + return associativity * line_partitions * line_size * sets; > + } while (eax & 0x1f && ++cnt); > + > + /* Let user space know we have LLC, but we can't figure it out */ > + DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n"); > + return 1; > +} This function looks good to me, since I wrote most of it ;)
On Wed, Jul 10, 2013 at 05:16:54PM -0700, Chad Versace wrote: > On 07/09/2013 07:58 PM, Ben Widawsky wrote: > >The algorithm/information was originally written by Chad, though I > >changed the control flow, and I think his original code had a couple of > >bugs, though I didn't look very hard before rewriting. That could have > >also been different interpretations of the spec. > > > >The excellent comments remain entirely copied from Chad's code. > > > >I've tested this on two platforms, and it seems to perform how I want. > > > >CC: Chad Versace <chad.versace@linux.intel.com> > >CC: Bryan Bell <bryan.j.bell@intel.com> > >Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > >--- > > drivers/gpu/drm/i915/i915_dma.c | 2 +- > > drivers/gpu/drm/i915/i915_drv.h | 2 ++ > > drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 56 insertions(+), 1 deletion(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > >index 0e22142..377949e 100644 > >--- a/drivers/gpu/drm/i915/i915_dma.c > >+++ b/drivers/gpu/drm/i915/i915_dma.c > >@@ -974,7 +974,7 @@ static int i915_getparam(struct drm_device *dev, void *data, > > value = 1; > > break; > > case I915_PARAM_HAS_LLC: > >- value = HAS_LLC(dev); > >+ value = dev_priv->llc_size; > > break; > > case I915_PARAM_HAS_ALIASING_PPGTT: > > value = dev_priv->mm.aliasing_ppgtt ? 1 : 0; > > I would like to see a dedicated param for the llc size. 'has' is always > boolean valued, right? Sounds like Chris wants the same anyway, but no. Getparam is always an int. Just the name is, 'has' > > >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >index c8d6104..43a549d 100644 > >--- a/drivers/gpu/drm/i915/i915_drv.h > >+++ b/drivers/gpu/drm/i915/i915_drv.h > >@@ -1187,6 +1187,8 @@ typedef struct drm_i915_private { > > /* Old dri1 support infrastructure, beware the dragons ya fools entering > > * here! */ > > struct i915_dri1_state dri1; > >+ > >+ size_t llc_size; > > } drm_i915_private_t; > > > > /* Iterate over initialised rings */ > >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >index af61be8..a070686 100644 > >--- a/drivers/gpu/drm/i915/i915_gem.c > >+++ b/drivers/gpu/drm/i915/i915_gem.c > >@@ -4282,6 +4282,57 @@ i915_gem_lastclose(struct drm_device *dev) > > DRM_ERROR("failed to idle hardware: %d\n", ret); > > } > > > >+/** > >+ * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3 > >+ * cache, or if an error occurs in obtaining the cache size, then return 0. > >+ * From "Intel Processor Identification and the CPUID Instruction > 5.15 > >+ * Deterministic Cache Parmaeters (Function 04h)": > >+ * When EAX is initialized to a value of 4, the CPUID instruction returns > >+ * deterministic cache information in the EAX, EBX, ECX and EDX registers. > >+ * This function requires ECX be initialized with an index which indicates > >+ * which cache to return information about. The OS is expected to call this > >+ * function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no > >+ * more caches. The order in which the caches are returned is not specified > >+ * and may change at Intel's discretion. > >+ * > >+ * Equation 5-4. Calculating the Cache Size in bytes: > >+ * = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1) > >+ * = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1) > >+ */ > >+static size_t get_llc_size(struct drm_device *dev) > >+{ > >+ u8 cnt = 0; > >+ unsigned int eax, ebx, ecx, edx; > >+ > >+ if (!HAS_LLC(dev)) > >+ return 0; > >+ > >+ do { > >+ uint32_t cache_level; > >+ uint32_t associativity, line_partitions, line_size, sets; > >+ > >+ eax = 4; > >+ ecx = cnt; > >+ __cpuid(&eax, &ebx, &ecx, &edx); > >+ > >+ cache_level = (eax >> 5) & 0x7; > >+ if (cache_level != 3) > >+ continue; > >+ > >+ associativity = ((ebx >> 22) & 0x3ff) + 1; > >+ line_partitions = ((ebx >> 12) & 0x3ff) + 1; > >+ line_size = (ebx & 0xfff) + 1; > >+ sets = ecx + 1; > >+ > >+ return associativity * line_partitions * line_size * sets; > >+ } while (eax & 0x1f && ++cnt); > >+ > >+ /* Let user space know we have LLC, but we can't figure it out */ > >+ DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n"); > >+ return 1; > >+} > > This function looks good to me, since I wrote most of it ;) > Still arguing over whether or not to try to pull it from the x86 core. I tried to get some comment from hpa today , but haven't heard anything yet. I do agree if we can just take the code from the x86 core it would be best, but AFAICT I'd need to change some code, or do an almost equal amount of logic to determine l2 vs. l3. Thinking about it now though, I suppose I can assume if HAS_LLC, then whatever cache size was obtained during boot is l3... dunno.
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 0e22142..377949e 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -974,7 +974,7 @@ static int i915_getparam(struct drm_device *dev, void *data, value = 1; break; case I915_PARAM_HAS_LLC: - value = HAS_LLC(dev); + value = dev_priv->llc_size; break; case I915_PARAM_HAS_ALIASING_PPGTT: value = dev_priv->mm.aliasing_ppgtt ? 1 : 0; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c8d6104..43a549d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1187,6 +1187,8 @@ typedef struct drm_i915_private { /* Old dri1 support infrastructure, beware the dragons ya fools entering * here! */ struct i915_dri1_state dri1; + + size_t llc_size; } drm_i915_private_t; /* Iterate over initialised rings */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index af61be8..a070686 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4282,6 +4282,57 @@ i915_gem_lastclose(struct drm_device *dev) DRM_ERROR("failed to idle hardware: %d\n", ret); } +/** + * Return the size, in bytes, of the CPU L3 cache size. If the CPU has no L3 + * cache, or if an error occurs in obtaining the cache size, then return 0. + * From "Intel Processor Identification and the CPUID Instruction > 5.15 + * Deterministic Cache Parmaeters (Function 04h)": + * When EAX is initialized to a value of 4, the CPUID instruction returns + * deterministic cache information in the EAX, EBX, ECX and EDX registers. + * This function requires ECX be initialized with an index which indicates + * which cache to return information about. The OS is expected to call this + * function (CPUID.4) with ECX = 0, 1, 2, until EAX[4:0] == 0, indicating no + * more caches. The order in which the caches are returned is not specified + * and may change at Intel's discretion. + * + * Equation 5-4. Calculating the Cache Size in bytes: + * = (Ways +1) ? (Partitions +1) ? (Line Size +1) ? (Sets +1) + * = (EBX[31:22] +1) ? (EBX[21:12] +1) ? (EBX[11:0] +1 ? (ECX + 1) + */ +static size_t get_llc_size(struct drm_device *dev) +{ + u8 cnt = 0; + unsigned int eax, ebx, ecx, edx; + + if (!HAS_LLC(dev)) + return 0; + + do { + uint32_t cache_level; + uint32_t associativity, line_partitions, line_size, sets; + + eax = 4; + ecx = cnt; + __cpuid(&eax, &ebx, &ecx, &edx); + + cache_level = (eax >> 5) & 0x7; + if (cache_level != 3) + continue; + + associativity = ((ebx >> 22) & 0x3ff) + 1; + line_partitions = ((ebx >> 12) & 0x3ff) + 1; + line_size = (ebx & 0xfff) + 1; + sets = ecx + 1; + + return associativity * line_partitions * line_size * sets; + } while (eax & 0x1f && ++cnt); + + /* Let user space know we have LLC, but we can't figure it out */ + DRM_DEBUG_DRIVER("Couldn't find LLC size. Bug?\n"); + return 1; +} + + static void init_ring_lists(struct intel_ring_buffer *ring) { @@ -4333,6 +4384,8 @@ i915_gem_load(struct drm_device *dev) else dev_priv->num_fence_regs = 8; + dev_priv->llc_size = get_llc_size(dev); + /* Initialize fence registers to zero */ INIT_LIST_HEAD(&dev_priv->mm.fence_list); i915_gem_restore_fences(dev);
The algorithm/information was originally written by Chad, though I changed the control flow, and I think his original code had a couple of bugs, though I didn't look very hard before rewriting. That could have also been different interpretations of the spec. The excellent comments remain entirely copied from Chad's code. I've tested this on two platforms, and it seems to perform how I want. CC: Chad Versace <chad.versace@linux.intel.com> CC: Bryan Bell <bryan.j.bell@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_dma.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_gem.c | 53 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+), 1 deletion(-)