Message ID | 20210506191451.77768-40-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Basic GuC submission support in the i915 | expand |
On 06.05.2021 21:13, Matthew Brost wrote: > With the introduction of non-blocking CTBs more than one CTB can be in > flight at a time. Increasing the size of the CTBs should reduce how > often software hits the case where no space is available in the CTB > buffer. > > Cc: John Harrison <john.c.harrison@intel.com> > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index 77dfbc94dcc3..d6895d29ed2d 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) > * +--------+-----------------------------------------------+------+ > * > * Size of each `CT Buffer`_ must be multiple of 4K. > - * As we don't expect too many messages, for now use minimum sizes. > + * We don't expect too many messages in flight at any time, unless we are > + * using the GuC submission. In that case each request requires a minimum > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this nit: all our CTB calculations are in dwords now, not bytes > + * enough space to avoid backpressure on the driver. We increase the size > + * of the receive buffer (relative to the send) to ensure a G2H response > + * CTB has a landing spot. hmm, but we are not checking G2H CTB yet will start doing it around patch 54/97 so maybe this other patch should be introduced earlier ? > */ > #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) > #define CTB_H2G_BUFFER_SIZE (SZ_4K) > -#define CTB_G2H_BUFFER_SIZE (SZ_4K) > +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) in theory, we (host) should be faster than GuC, so G2H CTB shall be almost always empty, if this is not a case, maybe we should start monitoring what is happening and report some warnings if G2H is half full ? > > #define MAX_US_STALL_CTB 1000000 > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > /* beware of buffer wrap case */ > if (unlikely(available < 0)) > available += size; > - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); > + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); > GEM_BUG_ON(available < 0); > > header = cmds[head]; >
On Mon, May 24, 2021 at 03:43:11PM +0200, Michal Wajdeczko wrote: > > > On 06.05.2021 21:13, Matthew Brost wrote: > > With the introduction of non-blocking CTBs more than one CTB can be in > > flight at a time. Increasing the size of the CTBs should reduce how > > often software hits the case where no space is available in the CTB > > buffer. > > > > Cc: John Harrison <john.c.harrison@intel.com> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > index 77dfbc94dcc3..d6895d29ed2d 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) > > * +--------+-----------------------------------------------+------+ > > * > > * Size of each `CT Buffer`_ must be multiple of 4K. > > - * As we don't expect too many messages, for now use minimum sizes. > > + * We don't expect too many messages in flight at any time, unless we are > > + * using the GuC submission. In that case each request requires a minimum > > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this > > nit: all our CTB calculations are in dwords now, not bytes > I can change the wording to DW sizes. > > + * enough space to avoid backpressure on the driver. We increase the size > > + * of the receive buffer (relative to the send) to ensure a G2H response > > + * CTB has a landing spot. > > hmm, but we are not checking G2H CTB yet > will start doing it around patch 54/97 > so maybe this other patch should be introduced earlier ? > Yes, that patch is going to be pulled down to an earlier spot in the series. > > */ > > #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) > > #define CTB_H2G_BUFFER_SIZE (SZ_4K) > > -#define CTB_G2H_BUFFER_SIZE (SZ_4K) > > +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) > > in theory, we (host) should be faster than GuC, so G2H CTB shall be > almost always empty, if this is not a case, maybe we should start > monitoring what is happening and report some warnings if G2H is half full ? > Certainly some IGTs put some more pressure on the G2H channel than the H2G channel at least I think. This is something we can tune over time after this lands upstream. IMO a message at this point is overkill. Matt > > > > #define MAX_US_STALL_CTB 1000000 > > > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > > /* beware of buffer wrap case */ > > if (unlikely(available < 0)) > > available += size; > > - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); > > + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); > > GEM_BUG_ON(available < 0); > > > > header = cmds[head]; > >
On 06/05/2021 20:13, Matthew Brost wrote: > With the introduction of non-blocking CTBs more than one CTB can be in > flight at a time. Increasing the size of the CTBs should reduce how > often software hits the case where no space is available in the CTB > buffer. I'd move this before the patch which adds the non-blocking send since that one claims congestion should be rare with properly sized buffers. So it makes sense to have them sized properly back before that one. Regards, Tvrtko > Cc: John Harrison <john.c.harrison@intel.com> > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index 77dfbc94dcc3..d6895d29ed2d 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) > * +--------+-----------------------------------------------+------+ > * > * Size of each `CT Buffer`_ must be multiple of 4K. > - * As we don't expect too many messages, for now use minimum sizes. > + * We don't expect too many messages in flight at any time, unless we are > + * using the GuC submission. In that case each request requires a minimum > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this > + * enough space to avoid backpressure on the driver. We increase the size > + * of the receive buffer (relative to the send) to ensure a G2H response > + * CTB has a landing spot. > */ > #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) > #define CTB_H2G_BUFFER_SIZE (SZ_4K) > -#define CTB_G2H_BUFFER_SIZE (SZ_4K) > +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) > > #define MAX_US_STALL_CTB 1000000 > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > /* beware of buffer wrap case */ > if (unlikely(available < 0)) > available += size; > - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); > + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); > GEM_BUG_ON(available < 0); > > header = cmds[head]; >
On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote: > > On 06/05/2021 20:13, Matthew Brost wrote: > > With the introduction of non-blocking CTBs more than one CTB can be in > > flight at a time. Increasing the size of the CTBs should reduce how > > often software hits the case where no space is available in the CTB > > buffer. > > I'd move this before the patch which adds the non-blocking send since that > one claims congestion should be rare with properly sized buffers. So it > makes sense to have them sized properly back before that one. > IMO patch ordering is a bit of bikeshed. All these CTBs changes required for GuC submission (34-40, 54) will get posted its own series and get merged together. None of the individual patches break anything or is any of this code really used until GuC submission is turned on. I can move this when I post these patches by themselves but I just don't really see the point either way. Matt > Regards, > > Tvrtko > > > Cc: John Harrison <john.c.harrison@intel.com> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > index 77dfbc94dcc3..d6895d29ed2d 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) > > * +--------+-----------------------------------------------+------+ > > * > > * Size of each `CT Buffer`_ must be multiple of 4K. > > - * As we don't expect too many messages, for now use minimum sizes. > > + * We don't expect too many messages in flight at any time, unless we are > > + * using the GuC submission. In that case each request requires a minimum > > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this > > + * enough space to avoid backpressure on the driver. We increase the size > > + * of the receive buffer (relative to the send) to ensure a G2H response > > + * CTB has a landing spot. > > */ > > #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) > > #define CTB_H2G_BUFFER_SIZE (SZ_4K) > > -#define CTB_G2H_BUFFER_SIZE (SZ_4K) > > +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) > > #define MAX_US_STALL_CTB 1000000 > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > > /* beware of buffer wrap case */ > > if (unlikely(available < 0)) > > available += size; > > - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); > > + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); > > GEM_BUG_ON(available < 0); > > header = cmds[head]; > >
On 25/05/2021 18:15, Matthew Brost wrote: > On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote: >> >> On 06/05/2021 20:13, Matthew Brost wrote: >>> With the introduction of non-blocking CTBs more than one CTB can be in >>> flight at a time. Increasing the size of the CTBs should reduce how >>> often software hits the case where no space is available in the CTB >>> buffer. >> >> I'd move this before the patch which adds the non-blocking send since that >> one claims congestion should be rare with properly sized buffers. So it >> makes sense to have them sized properly back before that one. >> > > IMO patch ordering is a bit of bikeshed. All these CTBs changes required > for GuC submission (34-40, 54) will get posted its own series and get > merged together. None of the individual patches break anything or is any > of this code really used until GuC submission is turned on. I can move > this when I post these patches by themselves but I just don't really see > the point either way. As a general principle we do try to have work in the order which makes sense functionality wise. That includes trying to avoid adding and then removing, or changing a lot, the same code within the series. And also adding functionality which is known to not work completely well until later in the series. With a master switch at the end of series you can sometimes get away with it, but if nothing else it at least makes it much easier to read if things are flowing in the expected way within (the series). In this particular example sizing the buffers appropriately before starting to use the facility a lot more certainly sounds like a no brainer to me, especially since the patch is so trivial to move conflict wise. Regards, Tvrtko > Matt > >> Regards, >> >> Tvrtko >> >>> Cc: John Harrison <john.c.harrison@intel.com> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- >>> 1 file changed, 8 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> index 77dfbc94dcc3..d6895d29ed2d 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) >>> * +--------+-----------------------------------------------+------+ >>> * >>> * Size of each `CT Buffer`_ must be multiple of 4K. >>> - * As we don't expect too many messages, for now use minimum sizes. >>> + * We don't expect too many messages in flight at any time, unless we are >>> + * using the GuC submission. In that case each request requires a minimum >>> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this >>> + * enough space to avoid backpressure on the driver. We increase the size >>> + * of the receive buffer (relative to the send) to ensure a G2H response >>> + * CTB has a landing spot. >>> */ >>> #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) >>> #define CTB_H2G_BUFFER_SIZE (SZ_4K) >>> -#define CTB_G2H_BUFFER_SIZE (SZ_4K) >>> +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) >>> #define MAX_US_STALL_CTB 1000000 >>> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) >>> /* beware of buffer wrap case */ >>> if (unlikely(available < 0)) >>> available += size; >>> - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); >>> + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); >>> GEM_BUG_ON(available < 0); >>> header = cmds[head]; >>>
On Wed, May 26, 2021 at 10:30:27AM +0100, Tvrtko Ursulin wrote: > > On 25/05/2021 18:15, Matthew Brost wrote: > > On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote: > > > > > > On 06/05/2021 20:13, Matthew Brost wrote: > > > > With the introduction of non-blocking CTBs more than one CTB can be in > > > > flight at a time. Increasing the size of the CTBs should reduce how > > > > often software hits the case where no space is available in the CTB > > > > buffer. > > > > > > I'd move this before the patch which adds the non-blocking send since that > > > one claims congestion should be rare with properly sized buffers. So it > > > makes sense to have them sized properly back before that one. > > > > > > > IMO patch ordering is a bit of bikeshed. All these CTBs changes required > > for GuC submission (34-40, 54) will get posted its own series and get > > merged together. None of the individual patches break anything or is any > > of this code really used until GuC submission is turned on. I can move > > this when I post these patches by themselves but I just don't really see > > the point either way. > > As a general principle we do try to have work in the order which makes sense > functionality wise. > > That includes trying to avoid adding and then removing, or changing a lot, > the same code within the series. And also adding functionality which is > known to not work completely well until later in the series. > > With a master switch at the end of series you can sometimes get away with > it, but if nothing else it at least makes it much easier to read if things > are flowing in the expected way within (the series). > > In this particular example sizing the buffers appropriately before starting > to use the facility a lot more certainly sounds like a no brainer to me, > especially since the patch is so trivial to move conflict wise. > Fair enough. I'll reorder these patches when I do a post to merge these ones. Matt > Regards, > > Tvrtko > > > Matt > > > Regards, > > > > > > Tvrtko > > > > > > > Cc: John Harrison <john.c.harrison@intel.com> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- > > > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > > > index 77dfbc94dcc3..d6895d29ed2d 100644 > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > > > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) > > > > * +--------+-----------------------------------------------+------+ > > > > * > > > > * Size of each `CT Buffer`_ must be multiple of 4K. > > > > - * As we don't expect too many messages, for now use minimum sizes. > > > > + * We don't expect too many messages in flight at any time, unless we are > > > > + * using the GuC submission. In that case each request requires a minimum > > > > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this > > > > + * enough space to avoid backpressure on the driver. We increase the size > > > > + * of the receive buffer (relative to the send) to ensure a G2H response > > > > + * CTB has a landing spot. > > > > */ > > > > #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) > > > > #define CTB_H2G_BUFFER_SIZE (SZ_4K) > > > > -#define CTB_G2H_BUFFER_SIZE (SZ_4K) > > > > +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) > > > > #define MAX_US_STALL_CTB 1000000 > > > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > > > > /* beware of buffer wrap case */ > > > > if (unlikely(available < 0)) > > > > available += size; > > > > - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); > > > > + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); > > > > GEM_BUG_ON(available < 0); > > > > header = cmds[head]; > > > >
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 77dfbc94dcc3..d6895d29ed2d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) * +--------+-----------------------------------------------+------+ * * Size of each `CT Buffer`_ must be multiple of 4K. - * As we don't expect too many messages, for now use minimum sizes. + * We don't expect too many messages in flight at any time, unless we are + * using the GuC submission. In that case each request requires a minimum + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this + * enough space to avoid backpressure on the driver. We increase the size + * of the receive buffer (relative to the send) to ensure a G2H response + * CTB has a landing spot. */ #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) #define MAX_US_STALL_CTB 1000000 @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size; - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); GEM_BUG_ON(available < 0); header = cmds[head];
With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer. Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)