Message ID | 1444218245-8430-1-git-send-email-currojerez@riseup.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote: > This programs the L3 configuration based on the sizes given for each > partition as arguments. The relevant register writes are added to the > workaround list so that they are re-applied to each context while it's > initialized, preventing state leaks from other userspace processes > which may have modified the L3 partitioning from its boot-up state, > since all relevant registers are part of the software and hardware > command checker whitelists. > > Some userspace clients (DDX and current versions of Mesa not patched > with my L3 partitioning series [1]) assume that the L3 configuration, > in particular the URB size, comes up in certain state when a context > is created, but nothing in the kernel guarantees this assumption, the > registers that control the partitioning of the L3 cache were being > left untouched. > > Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the > same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but > the latter will be removed in a future commit. > > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html Merge this with 1 + 5 (or was it 4?)- not only does it introduce a temporary unused function, but we have to jump between patches to see whether the function is safe (especially given those BUGs), and you add all w/a in the same patch so bisection is not improved. Looking at the function itself, I am not convinced that it actually adds anything over calling setting up the WA from the vfuncs - at least the bdw/gen7 split is redundant (we split at the vfunc then call one function where we replit again, but with nasty constraints on the interface of the function for different gen, it's not a function for the faint of heart). > +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring, > + unsigned int n_urb, > + unsigned int n_ro, > + unsigned int n_dc, > + unsigned int n_all) > +{ > + struct drm_device *dev = ring->dev; > + struct drm_i915_private *dev_priv = dev->dev_private; > + > + if (INTEL_INFO(dev)->gen >= 8) { Why use dev here? You already have dev_priv, so why chase the pointer again? -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote: >> This programs the L3 configuration based on the sizes given for each >> partition as arguments. The relevant register writes are added to the >> workaround list so that they are re-applied to each context while it's >> initialized, preventing state leaks from other userspace processes >> which may have modified the L3 partitioning from its boot-up state, >> since all relevant registers are part of the software and hardware >> command checker whitelists. >> >> Some userspace clients (DDX and current versions of Mesa not patched >> with my L3 partitioning series [1]) assume that the L3 configuration, >> in particular the URB size, comes up in certain state when a context >> is created, but nothing in the kernel guarantees this assumption, the >> registers that control the partitioning of the L3 cache were being >> left untouched. >> >> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the >> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but >> the latter will be removed in a future commit. >> >> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html > > Merge this with 1 + 5 (or was it 4?)- not only does it introduce a > temporary unused function, but we have to jump between patches to see > whether the function is safe (especially given those BUGs), and you add > all w/a in the same patch so bisection is not improved. > What do you want me to merge it with? This *is* PATCH 1, and PATCH 5 is largely unrelated. I wouldn't have any objection to squashing PATCH 4 into this commit though. > Looking at the function itself, I am not convinced that it actually adds > anything over calling setting up the WA from the vfuncs - at least the > bdw/gen7 split is redundant (we split at the vfunc then call one > function where we replit again, but with nasty constraints on the > interface of the function for different gen, it's not a function for the > faint of heart). > The constraints are just the hardware's constraints on the L3 partitioning. The function is meant to implement the details of programming the L3 configuration, which is different for different gens even though the overall structure of the L3 partitioning is the same. Of course the constraints set by specific hardware on the partition sizes cannot be abstracted out. I guess that splitting this out into two functions (one for gen7 and another for gen8) wouldn't hurt much, but open-coding the function in all its uses (5) would involve duplicating quite some code. Assuming I split the function into gen7 and gen8 variants, would you like them to implement a common consistent interface (i.e. the same prototype that my init_l3_partitioning_workarounds() implements), or would you like the variant for each gen to implement an ad-hoc interface according to the partition configs actually needed on each gen? I suspect the former would be cleaner. >> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring, >> + unsigned int n_urb, >> + unsigned int n_ro, >> + unsigned int n_dc, >> + unsigned int n_all) >> +{ >> + struct drm_device *dev = ring->dev; >> + struct drm_i915_private *dev_priv = dev->dev_private; >> + >> + if (INTEL_INFO(dev)->gen >= 8) { > > Why use dev here? You already have dev_priv, so why chase the pointer > again? Sorry? Does INTEL_INFO() take a drm_device or a drm_i915_private pointer as argument? The two types don't seem to be related by an inheritance relationship or anything similar AFAICT, and most other uses in this file seem to pass a drm_device as argument even where a drm_i915_private is available. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Wed, Oct 07, 2015 at 04:30:35PM +0300, Francisco Jerez wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote: > >> This programs the L3 configuration based on the sizes given for each > >> partition as arguments. The relevant register writes are added to the > >> workaround list so that they are re-applied to each context while it's > >> initialized, preventing state leaks from other userspace processes > >> which may have modified the L3 partitioning from its boot-up state, > >> since all relevant registers are part of the software and hardware > >> command checker whitelists. > >> > >> Some userspace clients (DDX and current versions of Mesa not patched > >> with my L3 partitioning series [1]) assume that the L3 configuration, > >> in particular the URB size, comes up in certain state when a context > >> is created, but nothing in the kernel guarantees this assumption, the > >> registers that control the partitioning of the L3 cache were being > >> left untouched. > >> > >> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the > >> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but > >> the latter will be removed in a future commit. > >> > >> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html > > > > Merge this with 1 + 5 (or was it 4?)- not only does it introduce a > > temporary unused function, but we have to jump between patches to see > > whether the function is safe (especially given those BUGs), and you add > > all w/a in the same patch so bisection is not improved. > > > What do you want me to merge it with? This *is* PATCH 1, and PATCH 5 is > largely unrelated. I wouldn't have any objection to squashing PATCH 4 > into this commit though. 1+4. That just highlights the issue of having two very tightly coupled patches arbitrary split - the split here doesn't even improve bisection. > > Looking at the function itself, I am not convinced that it actually adds > > anything over calling setting up the WA from the vfuncs - at least the > > bdw/gen7 split is redundant (we split at the vfunc then call one > > function where we replit again, but with nasty constraints on the > > interface of the function for different gen, it's not a function for the > > faint of heart). > > > The constraints are just the hardware's constraints on the L3 > partitioning. The function is meant to implement the details of > programming the L3 configuration, which is different for different gens > even though the overall structure of the L3 partitioning is the same. > Of course the constraints set by specific hardware on the partition > sizes cannot be abstracted out. > > I guess that splitting this out into two functions (one for gen7 and > another for gen8) wouldn't hurt much, but open-coding the function in > all its uses (5) would involve duplicating quite some code. Having slept on it, and I think a gen7/gen8 split is best. My issue with the current combined function is that the interface is different based on generation, which says to me that it is multiple functions. It scores very low on the hard-to-misuse ranking http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html And instead of the multiline multiple ternary operators, just have an if-else-chain for hsw/vlv/ivb (for setting the cmd value). > Assuming I split the function into gen7 and gen8 variants, would you > like them to implement a common consistent interface (i.e. the same > prototype that my init_l3_partitioning_workarounds() implements), or > would you like the variant for each gen to implement an ad-hoc interface > according to the partition configs actually needed on each gen? I > suspect the former would be cleaner. No, the BUGs tell me that the registers do not have a consistent interface. > >> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring, > >> + unsigned int n_urb, > >> + unsigned int n_ro, > >> + unsigned int n_dc, > >> + unsigned int n_all) > >> +{ > >> + struct drm_device *dev = ring->dev; > >> + struct drm_i915_private *dev_priv = dev->dev_private; > >> + > >> + if (INTEL_INFO(dev)->gen >= 8) { > > > > Why use dev here? You already have dev_priv, so why chase the pointer > > again? > > Sorry? Does INTEL_INFO() take a drm_device or a drm_i915_private > pointer as argument? The two types don't seem to be related by an > inheritance relationship or anything similar AFAICT, and most other uses > in this file seem to pass a drm_device as argument even where a > drm_i915_private is available. drm_i915_private is our primary internal pointer. We can shave over 8KiB of object code by removing the pointer dance we have from still passing around drm_device and jumping to drm_i915_private. New code is supposed to be using drm_i915_private consistently for internal interfaces (we can then clearly see the external entry points). -Chris
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a7c9e8c..663bc8f 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5920,11 +5920,21 @@ enum skl_disp_power_wells { # define CHV_HZ_8X8_MODE_IN_1X (1<<15) # define BDW_HIZ_POWER_COMPILER_CLOCK_GATING_DISABLE (1<<3) +#define GEN8_L3CNTLREG 0x7034 +#define GEN8_L3CNTLREG_URB_ALLOC(n) ((n) << 1) +#define GEN8_L3CNTLREG_RO_ALLOC(n) ((n) << 11) +#define GEN8_L3CNTLREG_DC_ALLOC(n) ((n) << 18) +#define GEN8_L3CNTLREG_ALL_ALLOC(n) ((n) << 25) + #define GEN9_SLICE_COMMON_ECO_CHICKEN0 0x7308 #define DISABLE_PIXEL_MASK_CAMMING (1<<14) #define GEN7_L3SQCREG1 0xB010 #define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000 +#define IVB_L3SQCREG1_SQGHPCI_DEFAULT 0x00730000 +#define VLV_L3SQCREG1_SQGHPCI_DEFAULT 0x00D30000 +#define HSW_L3SQCREG1_SQGHPCI_DEFAULT 0x00610000 +#define GEN7_L3SQCREG1_CONV_DC_UC (1 << 24) #define GEN8_L3SQCREG1 0xB100 #define BDW_WA_L3SQCREG1_DEFAULT 0x784000 @@ -5933,6 +5943,9 @@ enum skl_disp_power_wells { #define GEN7_WA_FOR_GEN7_L3_CONTROL 0x3C47FF8C #define GEN7_L3AGDIS (1<<19) #define GEN7_L3CNTLREG2 0xB020 +#define GEN7_L3CNTLREG2_URB_ALLOC(n) ((n) << 1) +#define GEN7_L3CNTLREG2_RO_ALLOC(n) ((n) << 14) +#define GEN7_L3CNTLREG2_DC_ALLOC(n) ((n) << 21) #define GEN7_L3CNTLREG3 0xB024 #define GEN7_L3_CHICKEN_MODE_REGISTER 0xB030 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 9035f8c..54ca344 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -800,6 +800,86 @@ static int wa_add(struct drm_i915_private *dev_priv, #define WA_WRITE(addr, val) WA_REG(addr, 0xffffffff, val) +/** + * init_l3_partitioning_workarounds - Add L3 partitioning set-up to the WA list. + * + * @ring - Ring to program the L3 partitioning for. + * @n_urb - Number of ways to allocate for the URB. + * @n_ro - Number of ways to allocate for read-only L3 clients. + * @n_dc - Number of ways to allocate for the DC read-write L3 client. + * @n_all - Number of ways to allocate for the common pool shared + * among all L3 clients. + * + * Note that for this to work correctly the L3 cache must be + * completely flushed whenever the workaround list is applied to a + * context. + */ +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring, + unsigned int n_urb, + unsigned int n_ro, + unsigned int n_dc, + unsigned int n_all) +{ + struct drm_device *dev = ring->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + + if (INTEL_INFO(dev)->gen >= 8) { + /* + * The ALL partition may not be used simultaneously + * with RO and DC. + */ + BUG_ON(n_all && (n_ro || n_dc)); + + /* Just need to set up the L3 partitioning. */ + WA_WRITE(GEN8_L3CNTLREG, + GEN8_L3CNTLREG_URB_ALLOC(n_urb) | + GEN8_L3CNTLREG_RO_ALLOC(n_ro) | + GEN8_L3CNTLREG_DC_ALLOC(n_dc) | + GEN8_L3CNTLREG_ALL_ALLOC(n_all)); + + } else if (INTEL_INFO(dev)->gen >= 7) { + /* + * Offset applied by the hardware to the number of + * ways allocated to the URB, which is also the + * minimum legal URB allocation. + */ + const unsigned int n0_urb = (IS_VALLEYVIEW(dev) ? 32 : 0); + BUG_ON(n_urb < n0_urb); + + /* The ALL partition is not supported on Gen7. */ + BUG_ON(n_all); + + /* + * Init the L3SQ General and high priority credit + * initialization value to the hardware defaults + * (except for VLV B0 which supposedly defaults to a + * value different to the one we set here), and demote + * the DC to LLC if it has no ways assigned. + * + * WaIncreaseL3CreditsForVLVB0:vlv + */ + WA_WRITE(GEN7_L3SQCREG1, + (IS_HASWELL(dev) ? HSW_L3SQCREG1_SQGHPCI_DEFAULT : + IS_VALLEYVIEW(dev) ? VLV_L3SQCREG1_SQGHPCI_DEFAULT : + IVB_L3SQCREG1_SQGHPCI_DEFAULT) | + (n_dc ? 0 : GEN7_L3SQCREG1_CONV_DC_UC)); + + /* Set up the L3 partitioning. */ + WA_WRITE(GEN7_L3CNTLREG2, + GEN7_L3CNTLREG2_URB_ALLOC(n_urb - n0_urb) | + GEN7_L3CNTLREG2_RO_ALLOC(n_ro) | + GEN7_L3CNTLREG2_DC_ALLOC(n_dc)); + + WA_WRITE(GEN7_L3CNTLREG3, 0); + + } else { + /* No L3 on pre-Gen7 hardware. */ + BUG(); + } + + return 0; +} + static int gen8_init_workarounds(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev;
This programs the L3 configuration based on the sizes given for each partition as arguments. The relevant register writes are added to the workaround list so that they are re-applied to each context while it's initialized, preventing state leaks from other userspace processes which may have modified the L3 partitioning from its boot-up state, since all relevant registers are part of the software and hardware command checker whitelists. Some userspace clients (DDX and current versions of Mesa not patched with my L3 partitioning series [1]) assume that the L3 configuration, in particular the URB size, comes up in certain state when a context is created, but nothing in the kernel guarantees this assumption, the registers that control the partitioning of the L3 cache were being left untouched. Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but the latter will be removed in a future commit. [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html Signed-off-by: Francisco Jerez <currojerez@riseup.net> --- drivers/gpu/drm/i915/i915_reg.h | 13 ++++++ drivers/gpu/drm/i915/intel_ringbuffer.c | 80 +++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+)