diff mbox

[1/6] drm/i915: Implement L3 partitioning set-up from the workaround list.

Message ID 1444218245-8430-1-git-send-email-currojerez@riseup.net (mailing list archive)
State New, archived
Headers show

Commit Message

Francisco Jerez Oct. 7, 2015, 11:44 a.m. UTC
This programs the L3 configuration based on the sizes given for each
partition as arguments.  The relevant register writes are added to the
workaround list so that they are re-applied to each context while it's
initialized, preventing state leaks from other userspace processes
which may have modified the L3 partitioning from its boot-up state,
since all relevant registers are part of the software and hardware
command checker whitelists.

Some userspace clients (DDX and current versions of Mesa not patched
with my L3 partitioning series [1]) assume that the L3 configuration,
in particular the URB size, comes up in certain state when a context
is created, but nothing in the kernel guarantees this assumption, the
registers that control the partitioning of the L3 cache were being
left untouched.

Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the
same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but
the latter will be removed in a future commit.

[1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
---
 drivers/gpu/drm/i915/i915_reg.h         | 13 ++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 80 +++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

Comments

Chris Wilson Oct. 7, 2015, 12:42 p.m. UTC | #1
On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote:
> This programs the L3 configuration based on the sizes given for each
> partition as arguments.  The relevant register writes are added to the
> workaround list so that they are re-applied to each context while it's
> initialized, preventing state leaks from other userspace processes
> which may have modified the L3 partitioning from its boot-up state,
> since all relevant registers are part of the software and hardware
> command checker whitelists.
> 
> Some userspace clients (DDX and current versions of Mesa not patched
> with my L3 partitioning series [1]) assume that the L3 configuration,
> in particular the URB size, comes up in certain state when a context
> is created, but nothing in the kernel guarantees this assumption, the
> registers that control the partitioning of the L3 cache were being
> left untouched.
> 
> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the
> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but
> the latter will be removed in a future commit.
> 
> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html

Merge this with 1 + 5 (or was it 4?)- not only does it introduce a
temporary unused function, but we have to jump between patches to see
whether the function is safe (especially given those BUGs), and you add
all w/a in the same patch so bisection is not improved.

Looking at the function itself, I am not convinced that it actually adds
anything over calling setting up the WA from the vfuncs - at least the
bdw/gen7 split is redundant (we split at the vfunc then call one
function where we replit again, but with nasty constraints on the
interface of the function for different gen, it's not a function for the
faint of heart).

> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring,
> +					    unsigned int n_urb,
> +					    unsigned int n_ro,
> +					    unsigned int n_dc,
> +					    unsigned int n_all)
> +{
> +	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	if (INTEL_INFO(dev)->gen >= 8) {

Why use dev here? You already have dev_priv, so why chase the pointer
again?
-Chris
Francisco Jerez Oct. 7, 2015, 1:30 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote:
>> This programs the L3 configuration based on the sizes given for each
>> partition as arguments.  The relevant register writes are added to the
>> workaround list so that they are re-applied to each context while it's
>> initialized, preventing state leaks from other userspace processes
>> which may have modified the L3 partitioning from its boot-up state,
>> since all relevant registers are part of the software and hardware
>> command checker whitelists.
>> 
>> Some userspace clients (DDX and current versions of Mesa not patched
>> with my L3 partitioning series [1]) assume that the L3 configuration,
>> in particular the URB size, comes up in certain state when a context
>> is created, but nothing in the kernel guarantees this assumption, the
>> registers that control the partitioning of the L3 cache were being
>> left untouched.
>> 
>> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the
>> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but
>> the latter will be removed in a future commit.
>> 
>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html
>
> Merge this with 1 + 5 (or was it 4?)- not only does it introduce a
> temporary unused function, but we have to jump between patches to see
> whether the function is safe (especially given those BUGs), and you add
> all w/a in the same patch so bisection is not improved.
>
What do you want me to merge it with?  This *is* PATCH 1, and PATCH 5 is
largely unrelated.  I wouldn't have any objection to squashing PATCH 4
into this commit though.

> Looking at the function itself, I am not convinced that it actually adds
> anything over calling setting up the WA from the vfuncs - at least the
> bdw/gen7 split is redundant (we split at the vfunc then call one
> function where we replit again, but with nasty constraints on the
> interface of the function for different gen, it's not a function for the
> faint of heart).
>
The constraints are just the hardware's constraints on the L3
partitioning.  The function is meant to implement the details of
programming the L3 configuration, which is different for different gens
even though the overall structure of the L3 partitioning is the same.
Of course the constraints set by specific hardware on the partition
sizes cannot be abstracted out.

I guess that splitting this out into two functions (one for gen7 and
another for gen8) wouldn't hurt much, but open-coding the function in
all its uses (5) would involve duplicating quite some code.

Assuming I split the function into gen7 and gen8 variants, would you
like them to implement a common consistent interface (i.e. the same
prototype that my init_l3_partitioning_workarounds() implements), or
would you like the variant for each gen to implement an ad-hoc interface
according to the partition configs actually needed on each gen?  I
suspect the former would be cleaner.

>> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring,
>> +					    unsigned int n_urb,
>> +					    unsigned int n_ro,
>> +					    unsigned int n_dc,
>> +					    unsigned int n_all)
>> +{
>> +	struct drm_device *dev = ring->dev;
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +
>> +	if (INTEL_INFO(dev)->gen >= 8) {
>
> Why use dev here? You already have dev_priv, so why chase the pointer
> again?

Sorry?  Does INTEL_INFO() take a drm_device or a drm_i915_private
pointer as argument?  The two types don't seem to be related by an
inheritance relationship or anything similar AFAICT, and most other uses
in this file seem to pass a drm_device as argument even where a
drm_i915_private is available.

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chris Wilson Oct. 8, 2015, 9:17 a.m. UTC | #3
On Wed, Oct 07, 2015 at 04:30:35PM +0300, Francisco Jerez wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote:
> >> This programs the L3 configuration based on the sizes given for each
> >> partition as arguments.  The relevant register writes are added to the
> >> workaround list so that they are re-applied to each context while it's
> >> initialized, preventing state leaks from other userspace processes
> >> which may have modified the L3 partitioning from its boot-up state,
> >> since all relevant registers are part of the software and hardware
> >> command checker whitelists.
> >> 
> >> Some userspace clients (DDX and current versions of Mesa not patched
> >> with my L3 partitioning series [1]) assume that the L3 configuration,
> >> in particular the URB size, comes up in certain state when a context
> >> is created, but nothing in the kernel guarantees this assumption, the
> >> registers that control the partitioning of the L3 cache were being
> >> left untouched.
> >> 
> >> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the
> >> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but
> >> the latter will be removed in a future commit.
> >> 
> >> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html
> >
> > Merge this with 1 + 5 (or was it 4?)- not only does it introduce a
> > temporary unused function, but we have to jump between patches to see
> > whether the function is safe (especially given those BUGs), and you add
> > all w/a in the same patch so bisection is not improved.
> >
> What do you want me to merge it with?  This *is* PATCH 1, and PATCH 5 is
> largely unrelated.  I wouldn't have any objection to squashing PATCH 4
> into this commit though.

1+4. That just highlights the issue of having two very tightly coupled
patches arbitrary split - the split here doesn't even improve bisection.
 
> > Looking at the function itself, I am not convinced that it actually adds
> > anything over calling setting up the WA from the vfuncs - at least the
> > bdw/gen7 split is redundant (we split at the vfunc then call one
> > function where we replit again, but with nasty constraints on the
> > interface of the function for different gen, it's not a function for the
> > faint of heart).
> >
> The constraints are just the hardware's constraints on the L3
> partitioning.  The function is meant to implement the details of
> programming the L3 configuration, which is different for different gens
> even though the overall structure of the L3 partitioning is the same.
> Of course the constraints set by specific hardware on the partition
> sizes cannot be abstracted out.
> 
> I guess that splitting this out into two functions (one for gen7 and
> another for gen8) wouldn't hurt much, but open-coding the function in
> all its uses (5) would involve duplicating quite some code.

Having slept on it, and I think a gen7/gen8 split is best.

My issue with the current combined function is that the interface is
different based on generation, which says to me that it is multiple
functions. It scores very low on the hard-to-misuse ranking

http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html

And instead of the multiline multiple ternary operators, just have
an if-else-chain for hsw/vlv/ivb (for setting the cmd value).
 
> Assuming I split the function into gen7 and gen8 variants, would you
> like them to implement a common consistent interface (i.e. the same
> prototype that my init_l3_partitioning_workarounds() implements), or
> would you like the variant for each gen to implement an ad-hoc interface
> according to the partition configs actually needed on each gen?  I
> suspect the former would be cleaner.

No, the BUGs tell me that the registers do not have a consistent
interface.
 
> >> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring,
> >> +					    unsigned int n_urb,
> >> +					    unsigned int n_ro,
> >> +					    unsigned int n_dc,
> >> +					    unsigned int n_all)
> >> +{
> >> +	struct drm_device *dev = ring->dev;
> >> +	struct drm_i915_private *dev_priv = dev->dev_private;
> >> +
> >> +	if (INTEL_INFO(dev)->gen >= 8) {
> >
> > Why use dev here? You already have dev_priv, so why chase the pointer
> > again?
> 
> Sorry?  Does INTEL_INFO() take a drm_device or a drm_i915_private
> pointer as argument?  The two types don't seem to be related by an
> inheritance relationship or anything similar AFAICT, and most other uses
> in this file seem to pass a drm_device as argument even where a
> drm_i915_private is available.

drm_i915_private is our primary internal pointer. We can shave over 8KiB
of object code by removing the pointer dance we have from still passing
around drm_device and jumping to drm_i915_private. New code is supposed
to be using drm_i915_private consistently for internal interfaces (we
can then clearly see the external entry points).
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a7c9e8c..663bc8f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5920,11 +5920,21 @@  enum skl_disp_power_wells {
 # define CHV_HZ_8X8_MODE_IN_1X				(1<<15)
 # define BDW_HIZ_POWER_COMPILER_CLOCK_GATING_DISABLE	(1<<3)
 
+#define GEN8_L3CNTLREG				0x7034
+#define  GEN8_L3CNTLREG_URB_ALLOC(n)		((n) << 1)
+#define  GEN8_L3CNTLREG_RO_ALLOC(n)		((n) << 11)
+#define  GEN8_L3CNTLREG_DC_ALLOC(n)		((n) << 18)
+#define  GEN8_L3CNTLREG_ALL_ALLOC(n)		((n) << 25)
+
 #define GEN9_SLICE_COMMON_ECO_CHICKEN0		0x7308
 #define  DISABLE_PIXEL_MASK_CAMMING		(1<<14)
 
 #define GEN7_L3SQCREG1				0xB010
 #define  VLV_B0_WA_L3SQCREG1_VALUE		0x00D30000
+#define  IVB_L3SQCREG1_SQGHPCI_DEFAULT		0x00730000
+#define  VLV_L3SQCREG1_SQGHPCI_DEFAULT		0x00D30000
+#define  HSW_L3SQCREG1_SQGHPCI_DEFAULT		0x00610000
+#define  GEN7_L3SQCREG1_CONV_DC_UC		(1 << 24)
 
 #define GEN8_L3SQCREG1				0xB100
 #define  BDW_WA_L3SQCREG1_DEFAULT		0x784000
@@ -5933,6 +5943,9 @@  enum skl_disp_power_wells {
 #define  GEN7_WA_FOR_GEN7_L3_CONTROL			0x3C47FF8C
 #define  GEN7_L3AGDIS				(1<<19)
 #define GEN7_L3CNTLREG2				0xB020
+#define  GEN7_L3CNTLREG2_URB_ALLOC(n)		((n) << 1)
+#define  GEN7_L3CNTLREG2_RO_ALLOC(n)		((n) << 14)
+#define  GEN7_L3CNTLREG2_DC_ALLOC(n)		((n) << 21)
 #define GEN7_L3CNTLREG3				0xB024
 
 #define GEN7_L3_CHICKEN_MODE_REGISTER		0xB030
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9035f8c..54ca344 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -800,6 +800,86 @@  static int wa_add(struct drm_i915_private *dev_priv,
 
 #define WA_WRITE(addr, val) WA_REG(addr, 0xffffffff, val)
 
+/**
+ * init_l3_partitioning_workarounds - Add L3 partitioning set-up to the WA list.
+ *
+ * @ring - Ring to program the L3 partitioning for.
+ * @n_urb - Number of ways to allocate for the URB.
+ * @n_ro - Number of ways to allocate for read-only L3 clients.
+ * @n_dc - Number of ways to allocate for the DC read-write L3 client.
+ * @n_all - Number of ways to allocate for the common pool shared
+ *          among all L3 clients.
+ *
+ * Note that for this to work correctly the L3 cache must be
+ * completely flushed whenever the workaround list is applied to a
+ * context.
+ */
+static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring,
+					    unsigned int n_urb,
+					    unsigned int n_ro,
+					    unsigned int n_dc,
+					    unsigned int n_all)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (INTEL_INFO(dev)->gen >= 8) {
+		/*
+		 * The ALL partition may not be used simultaneously
+		 * with RO and DC.
+		 */
+		BUG_ON(n_all && (n_ro || n_dc));
+
+		/* Just need to set up the L3 partitioning. */
+		WA_WRITE(GEN8_L3CNTLREG,
+			 GEN8_L3CNTLREG_URB_ALLOC(n_urb) |
+			 GEN8_L3CNTLREG_RO_ALLOC(n_ro) |
+			 GEN8_L3CNTLREG_DC_ALLOC(n_dc) |
+			 GEN8_L3CNTLREG_ALL_ALLOC(n_all));
+
+	} else if (INTEL_INFO(dev)->gen >= 7) {
+		/*
+		 * Offset applied by the hardware to the number of
+		 * ways allocated to the URB, which is also the
+		 * minimum legal URB allocation.
+		 */
+		const unsigned int n0_urb = (IS_VALLEYVIEW(dev) ? 32 : 0);
+		BUG_ON(n_urb < n0_urb);
+
+		/* The ALL partition is not supported on Gen7. */
+		BUG_ON(n_all);
+
+		/*
+		 * Init the L3SQ General and high priority credit
+		 * initialization value to the hardware defaults
+		 * (except for VLV B0 which supposedly defaults to a
+		 * value different to the one we set here), and demote
+		 * the DC to LLC if it has no ways assigned.
+		 *
+		 * WaIncreaseL3CreditsForVLVB0:vlv
+		 */
+		WA_WRITE(GEN7_L3SQCREG1,
+			 (IS_HASWELL(dev) ? HSW_L3SQCREG1_SQGHPCI_DEFAULT :
+			  IS_VALLEYVIEW(dev) ? VLV_L3SQCREG1_SQGHPCI_DEFAULT :
+			  IVB_L3SQCREG1_SQGHPCI_DEFAULT) |
+			 (n_dc ? 0 : GEN7_L3SQCREG1_CONV_DC_UC));
+
+		/* Set up the L3 partitioning. */
+		WA_WRITE(GEN7_L3CNTLREG2,
+			 GEN7_L3CNTLREG2_URB_ALLOC(n_urb - n0_urb) |
+			 GEN7_L3CNTLREG2_RO_ALLOC(n_ro) |
+			 GEN7_L3CNTLREG2_DC_ALLOC(n_dc));
+
+		WA_WRITE(GEN7_L3CNTLREG3, 0);
+
+	} else {
+		/* No L3 on pre-Gen7 hardware. */
+		BUG();
+	}
+
+	return 0;
+}
+
 static int gen8_init_workarounds(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;