Message ID | 20190107115509.12523-42-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/46] drm/i915: Return immediately if trylock fails for direct-reclaim | expand |
On 1/7/2019 03:55, Chris Wilson wrote: > Previously we only accommodated having a vma pinned by a small number of > users, with the maximum being pinned for use by the display engine. As > such, we used a small bitfield only large enough to allow the vma to > be pinned twice (for back/front buffers) in each scanout plane. Keeping > the maximum permissible pin_count small allows us to quickly catch a > potential leak. However, as we want to split a 4096B page into 64 > different cachelines and pin each cacheline for use by a different > timeline, we will exceed the current maximum permissible vma->pin_count > and so time has come to enlarge it. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++------------- > drivers/gpu/drm/i915/i915_vma.h | 28 +++++++++------------------- > 2 files changed, 22 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h > index bd679c8c56dd..03ade71b8d9a 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h > @@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, > > /* Flags used by pin/bind&friends. */ > #define PIN_NONBLOCK BIT_ULL(0) > -#define PIN_MAPPABLE BIT_ULL(1) > -#define PIN_ZONE_4G BIT_ULL(2) > -#define PIN_NONFAULT BIT_ULL(3) > -#define PIN_NOEVICT BIT_ULL(4) > - > -#define PIN_MBZ BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */ > -#define PIN_GLOBAL BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */ > -#define PIN_USER BIT_ULL(7) /* I915_VMA_LOCAL_BIND */ > -#define PIN_UPDATE BIT_ULL(8) > - > -#define PIN_HIGH BIT_ULL(9) > -#define PIN_OFFSET_BIAS BIT_ULL(10) > -#define PIN_OFFSET_FIXED BIT_ULL(11) > +#define PIN_NONFAULT BIT_ULL(1) > +#define PIN_NOEVICT BIT_ULL(2) > +#define PIN_MAPPABLE BIT_ULL(3) > +#define PIN_ZONE_4G BIT_ULL(4) > +#define PIN_HIGH BIT_ULL(5) > +#define PIN_OFFSET_BIAS BIT_ULL(6) > +#define PIN_OFFSET_FIXED BIT_ULL(7) > + > +#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */ > +#define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */ > +#define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */ > +#define PIN_UPDATE BIT_ULL(11) > + The upper bits need moving to accommodate the larger count. And the HIGH/OFFSET_* fields are not shared with vma-flags so can be moved down with the other pin only flags. But I don't see a reason to shuffle the lower bits around? MAPPABLE to NOEVICT were 1,2,3,4 but are now 3,4,1,2. Is there some semantic meaning to the new order? > #define PIN_OFFSET_MASK (-I915_GTT_PAGE_SIZE) > > #endif > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 7252abc73d3e..266b226ebef2 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -70,30 +70,20 @@ struct i915_vma { > */ > unsigned int open_count; > unsigned long flags; > - /** > - * How many users have pinned this object in GTT space. The following > - * users can each hold at most one reference: pwrite/pread, execbuffer > - * (objects are not allowed multiple times for the same batchbuffer), > - * and the framebuffer code. When switching/pageflipping, the > - * framebuffer code has at most two buffers pinned per crtc. > - * > - * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3 > - * bits with absolutely no headroom. So use 4 bits. > - */ Is it not worth keeping some comment about the maximum pin count being bounded so 8-bits is guaranteed to be sufficient? Also, is the old comment actually valid? Surely modern hardware has more than two CRTCs so the limit of 7 was wrong anyway? Maybe even have a compile time assert that the mask size is greater than max(1 + 1 + 1 + 1 + 2*MAX_CRTC, PAGE_SIZE/CACHELINE_SIZE)? > -#define I915_VMA_PIN_MASK 0xf > -#define I915_VMA_PIN_OVERFLOW BIT(5) > +#define I915_VMA_PIN_MASK 0xff > +#define I915_VMA_PIN_OVERFLOW BIT(8) > > /** Flags and address space this VMA is bound to */ > -#define I915_VMA_GLOBAL_BIND BIT(6) > -#define I915_VMA_LOCAL_BIND BIT(7) > +#define I915_VMA_GLOBAL_BIND BIT(9) > +#define I915_VMA_LOCAL_BIND BIT(10) > #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW) > > -#define I915_VMA_GGTT BIT(8) > -#define I915_VMA_CAN_FENCE BIT(9) > -#define I915_VMA_CLOSED BIT(10) > -#define I915_VMA_USERFAULT_BIT 11 > +#define I915_VMA_GGTT BIT(11) > +#define I915_VMA_CAN_FENCE BIT(12) > +#define I915_VMA_CLOSED BIT(13) > +#define I915_VMA_USERFAULT_BIT 14 > #define I915_VMA_USERFAULT BIT(I915_VMA_USERFAULT_BIT) > -#define I915_VMA_GGTT_WRITE BIT(12) > +#define I915_VMA_GGTT_WRITE BIT(15) > > unsigned int active_count; > struct rb_root active;
Quoting John Harrison (2019-01-15 19:57:19) > On 1/7/2019 03:55, Chris Wilson wrote: > > Previously we only accommodated having a vma pinned by a small number of > > users, with the maximum being pinned for use by the display engine. As > > such, we used a small bitfield only large enough to allow the vma to > > be pinned twice (for back/front buffers) in each scanout plane. Keeping > > the maximum permissible pin_count small allows us to quickly catch a > > potential leak. However, as we want to split a 4096B page into 64 > > different cachelines and pin each cacheline for use by a different > > timeline, we will exceed the current maximum permissible vma->pin_count > > and so time has come to enlarge it. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++------------- > > drivers/gpu/drm/i915/i915_vma.h | 28 +++++++++------------------- > > 2 files changed, 22 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h > > index bd679c8c56dd..03ade71b8d9a 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h > > @@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, > > > > /* Flags used by pin/bind&friends. */ > > #define PIN_NONBLOCK BIT_ULL(0) > > -#define PIN_MAPPABLE BIT_ULL(1) > > -#define PIN_ZONE_4G BIT_ULL(2) > > -#define PIN_NONFAULT BIT_ULL(3) > > -#define PIN_NOEVICT BIT_ULL(4) > > - > > -#define PIN_MBZ BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */ > > -#define PIN_GLOBAL BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */ > > -#define PIN_USER BIT_ULL(7) /* I915_VMA_LOCAL_BIND */ > > -#define PIN_UPDATE BIT_ULL(8) > > - > > -#define PIN_HIGH BIT_ULL(9) > > -#define PIN_OFFSET_BIAS BIT_ULL(10) > > -#define PIN_OFFSET_FIXED BIT_ULL(11) > > +#define PIN_NONFAULT BIT_ULL(1) > > +#define PIN_NOEVICT BIT_ULL(2) > > +#define PIN_MAPPABLE BIT_ULL(3) > > +#define PIN_ZONE_4G BIT_ULL(4) > > +#define PIN_HIGH BIT_ULL(5) > > +#define PIN_OFFSET_BIAS BIT_ULL(6) > > +#define PIN_OFFSET_FIXED BIT_ULL(7) > > + > > +#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */ > > +#define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */ > > +#define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */ > > +#define PIN_UPDATE BIT_ULL(11) > > + > The upper bits need moving to accommodate the larger count. And the > HIGH/OFFSET_* fields are not shared with vma-flags so can be moved down > with the other pin only flags. But I don't see a reason to shuffle the > lower bits around? MAPPABLE to NOEVICT were 1,2,3,4 but are now 3,4,1,2. > Is there some semantic meaning to the new order? Just that: - bias, mappable, zone_4g: address limit specifiers + high: not strictly an address limit, but an address direction to search + fixed: address override, limits still apply though - nonblock, nonfault, noevict: search specifiers I just hadn't had an excuse to reorder them for a while. > > #define PIN_OFFSET_MASK (-I915_GTT_PAGE_SIZE) > > > > #endif > > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > > index 7252abc73d3e..266b226ebef2 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.h > > +++ b/drivers/gpu/drm/i915/i915_vma.h > > @@ -70,30 +70,20 @@ struct i915_vma { > > */ > > unsigned int open_count; > > unsigned long flags; > > - /** > > - * How many users have pinned this object in GTT space. The following > > - * users can each hold at most one reference: pwrite/pread, execbuffer > > - * (objects are not allowed multiple times for the same batchbuffer), > > - * and the framebuffer code. When switching/pageflipping, the > > - * framebuffer code has at most two buffers pinned per crtc. > > - * > > - * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3 > > - * bits with absolutely no headroom. So use 4 bits. > > - */ > Is it not worth keeping some comment about the maximum pin count being > bounded so 8-bits is guaranteed to be sufficient? Also, is the old > comment actually valid? Surely modern hardware has more than two CRTCs > so the limit of 7 was wrong anyway? Maybe even have a compile time > assert that the mask size is greater than max(1 + 1 + 1 + 1 + > 2*MAX_CRTC, PAGE_SIZE/CACHELINE_SIZE)? Is a comment accurate? rotfl Also I think we are up to 3*NUM_PLANES*NUM_CRTCS, but can't be quite sure with the atomic state tracking, so it might just be still 2 (but just wait until we have an actual flip queue). I was also wondering if we used 7b + negative byte for the overflow would generate better code. Still a comment towards that this should be bounded to a small number hence the "validity" in checking for overflow as part of the flags might be in order. -Chris
On 1/15/2019 12:17, Chris Wilson wrote: > Quoting John Harrison (2019-01-15 19:57:19) >> On 1/7/2019 03:55, Chris Wilson wrote: >>> Previously we only accommodated having a vma pinned by a small number of >>> users, with the maximum being pinned for use by the display engine. As >>> such, we used a small bitfield only large enough to allow the vma to >>> be pinned twice (for back/front buffers) in each scanout plane. Keeping >>> the maximum permissible pin_count small allows us to quickly catch a >>> potential leak. However, as we want to split a 4096B page into 64 >>> different cachelines and pin each cacheline for use by a different >>> timeline, we will exceed the current maximum permissible vma->pin_count >>> and so time has come to enlarge it. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> --- >>> drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++------------- >>> drivers/gpu/drm/i915/i915_vma.h | 28 +++++++++------------------- >>> 2 files changed, 22 insertions(+), 32 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h >>> index bd679c8c56dd..03ade71b8d9a 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h >>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h >>> @@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, >>> >>> /* Flags used by pin/bind&friends. */ >>> #define PIN_NONBLOCK BIT_ULL(0) >>> -#define PIN_MAPPABLE BIT_ULL(1) >>> -#define PIN_ZONE_4G BIT_ULL(2) >>> -#define PIN_NONFAULT BIT_ULL(3) >>> -#define PIN_NOEVICT BIT_ULL(4) >>> - >>> -#define PIN_MBZ BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */ >>> -#define PIN_GLOBAL BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */ >>> -#define PIN_USER BIT_ULL(7) /* I915_VMA_LOCAL_BIND */ >>> -#define PIN_UPDATE BIT_ULL(8) >>> - >>> -#define PIN_HIGH BIT_ULL(9) >>> -#define PIN_OFFSET_BIAS BIT_ULL(10) >>> -#define PIN_OFFSET_FIXED BIT_ULL(11) >>> +#define PIN_NONFAULT BIT_ULL(1) >>> +#define PIN_NOEVICT BIT_ULL(2) >>> +#define PIN_MAPPABLE BIT_ULL(3) >>> +#define PIN_ZONE_4G BIT_ULL(4) >>> +#define PIN_HIGH BIT_ULL(5) >>> +#define PIN_OFFSET_BIAS BIT_ULL(6) >>> +#define PIN_OFFSET_FIXED BIT_ULL(7) >>> + >>> +#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */ >>> +#define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */ >>> +#define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */ >>> +#define PIN_UPDATE BIT_ULL(11) >>> + >> The upper bits need moving to accommodate the larger count. And the >> HIGH/OFFSET_* fields are not shared with vma-flags so can be moved down >> with the other pin only flags. But I don't see a reason to shuffle the >> lower bits around? MAPPABLE to NOEVICT were 1,2,3,4 but are now 3,4,1,2. >> Is there some semantic meaning to the new order? > Just that: > - bias, mappable, zone_4g: address limit specifiers > + high: not strictly an address limit, but an address direction to search > + fixed: address override, limits still apply though > - nonblock, nonfault, noevict: search specifiers > > I just hadn't had an excuse to reorder them for a while. Fair enough. I just wanted to check it was deliberate and not some accidental remnant of some other change that was dropped along the way. > >>> #define PIN_OFFSET_MASK (-I915_GTT_PAGE_SIZE) >>> >>> #endif >>> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h >>> index 7252abc73d3e..266b226ebef2 100644 >>> --- a/drivers/gpu/drm/i915/i915_vma.h >>> +++ b/drivers/gpu/drm/i915/i915_vma.h >>> @@ -70,30 +70,20 @@ struct i915_vma { >>> */ >>> unsigned int open_count; >>> unsigned long flags; >>> - /** >>> - * How many users have pinned this object in GTT space. The following >>> - * users can each hold at most one reference: pwrite/pread, execbuffer >>> - * (objects are not allowed multiple times for the same batchbuffer), >>> - * and the framebuffer code. When switching/pageflipping, the >>> - * framebuffer code has at most two buffers pinned per crtc. >>> - * >>> - * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3 >>> - * bits with absolutely no headroom. So use 4 bits. >>> - */ >> Is it not worth keeping some comment about the maximum pin count being >> bounded so 8-bits is guaranteed to be sufficient? Also, is the old >> comment actually valid? Surely modern hardware has more than two CRTCs >> so the limit of 7 was wrong anyway? Maybe even have a compile time >> assert that the mask size is greater than max(1 + 1 + 1 + 1 + >> 2*MAX_CRTC, PAGE_SIZE/CACHELINE_SIZE)? > Is a comment accurate? rotfl One can but dream... > Also I think we are up to 3*NUM_PLANES*NUM_CRTCS, but can't be quite sure > with the atomic state tracking, so it might just be still 2 (but just > wait until we have an actual flip queue). That sounds like we should already be overflowing the current limit of 16?! > > I was also wondering if we used 7b + negative byte for the overflow > would generate better code. Meaning limit the count to 127 and use a signed char cast? Thus test for -ve rather than BIT(x)? Maybe. Although the above comment makes me nervous that even 127 might not be sufficient for very much longer. New hardware always brings more planes and heads! > > Still a comment towards that this should be bounded to a small number > hence the "validity" in checking for overflow as part of the flags might > be in order. > -Chris I think some kind of comment along those lines would be worth having. With that comment added: Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index bd679c8c56dd..03ade71b8d9a 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, /* Flags used by pin/bind&friends. */ #define PIN_NONBLOCK BIT_ULL(0) -#define PIN_MAPPABLE BIT_ULL(1) -#define PIN_ZONE_4G BIT_ULL(2) -#define PIN_NONFAULT BIT_ULL(3) -#define PIN_NOEVICT BIT_ULL(4) - -#define PIN_MBZ BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */ -#define PIN_GLOBAL BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */ -#define PIN_USER BIT_ULL(7) /* I915_VMA_LOCAL_BIND */ -#define PIN_UPDATE BIT_ULL(8) - -#define PIN_HIGH BIT_ULL(9) -#define PIN_OFFSET_BIAS BIT_ULL(10) -#define PIN_OFFSET_FIXED BIT_ULL(11) +#define PIN_NONFAULT BIT_ULL(1) +#define PIN_NOEVICT BIT_ULL(2) +#define PIN_MAPPABLE BIT_ULL(3) +#define PIN_ZONE_4G BIT_ULL(4) +#define PIN_HIGH BIT_ULL(5) +#define PIN_OFFSET_BIAS BIT_ULL(6) +#define PIN_OFFSET_FIXED BIT_ULL(7) + +#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */ +#define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */ +#define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */ +#define PIN_UPDATE BIT_ULL(11) + #define PIN_OFFSET_MASK (-I915_GTT_PAGE_SIZE) #endif diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 7252abc73d3e..266b226ebef2 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -70,30 +70,20 @@ struct i915_vma { */ unsigned int open_count; unsigned long flags; - /** - * How many users have pinned this object in GTT space. The following - * users can each hold at most one reference: pwrite/pread, execbuffer - * (objects are not allowed multiple times for the same batchbuffer), - * and the framebuffer code. When switching/pageflipping, the - * framebuffer code has at most two buffers pinned per crtc. - * - * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3 - * bits with absolutely no headroom. So use 4 bits. - */ -#define I915_VMA_PIN_MASK 0xf -#define I915_VMA_PIN_OVERFLOW BIT(5) +#define I915_VMA_PIN_MASK 0xff +#define I915_VMA_PIN_OVERFLOW BIT(8) /** Flags and address space this VMA is bound to */ -#define I915_VMA_GLOBAL_BIND BIT(6) -#define I915_VMA_LOCAL_BIND BIT(7) +#define I915_VMA_GLOBAL_BIND BIT(9) +#define I915_VMA_LOCAL_BIND BIT(10) #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW) -#define I915_VMA_GGTT BIT(8) -#define I915_VMA_CAN_FENCE BIT(9) -#define I915_VMA_CLOSED BIT(10) -#define I915_VMA_USERFAULT_BIT 11 +#define I915_VMA_GGTT BIT(11) +#define I915_VMA_CAN_FENCE BIT(12) +#define I915_VMA_CLOSED BIT(13) +#define I915_VMA_USERFAULT_BIT 14 #define I915_VMA_USERFAULT BIT(I915_VMA_USERFAULT_BIT) -#define I915_VMA_GGTT_WRITE BIT(12) +#define I915_VMA_GGTT_WRITE BIT(15) unsigned int active_count; struct rb_root active;
Previously we only accommodated having a vma pinned by a small number of users, with the maximum being pinned for use by the display engine. As such, we used a small bitfield only large enough to allow the vma to be pinned twice (for back/front buffers) in each scanout plane. Keeping the maximum permissible pin_count small allows us to quickly catch a potential leak. However, as we want to split a 4096B page into 64 different cachelines and pin each cacheline for use by a different timeline, we will exceed the current maximum permissible vma->pin_count and so time has come to enlarge it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++------------- drivers/gpu/drm/i915/i915_vma.h | 28 +++++++++------------------- 2 files changed, 22 insertions(+), 32 deletions(-)