diff mbox series

[RFC,39/97] drm/i915/guc: Increase size of CTB buffers

Message ID 20210506191451.77768-40-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series Basic GuC submission support in the i915 | expand

Commit Message

Matthew Brost May 6, 2021, 7:13 p.m. UTC
With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Michal Wajdeczko May 24, 2021, 1:43 p.m. UTC | #1
On 06.05.2021 21:13, Matthew Brost wrote:
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.
> 
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 77dfbc94dcc3..d6895d29ed2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>   *      +--------+-----------------------------------------------+------+
>   *
>   * Size of each `CT Buffer`_ must be multiple of 4K.
> - * As we don't expect too many messages, for now use minimum sizes.
> + * We don't expect too many messages in flight at any time, unless we are
> + * using the GuC submission. In that case each request requires a minimum
> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this

nit: all our CTB calculations are in dwords now, not bytes

> + * enough space to avoid backpressure on the driver. We increase the size
> + * of the receive buffer (relative to the send) to ensure a G2H response
> + * CTB has a landing spot.

hmm, but we are not checking G2H CTB yet
will start doing it around patch 54/97
so maybe this other patch should be introduced earlier ?

>   */
>  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)

in theory, we (host) should be faster than GuC, so G2H CTB shall be
almost always empty, if this is not a case, maybe we should start
monitoring what is happening and report some warnings if G2H is half full ?

>  
>  #define MAX_US_STALL_CTB	1000000
>  
> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
>  		available += size;
> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>  	GEM_BUG_ON(available < 0);
>  
>  	header = cmds[head];
>
Matthew Brost May 24, 2021, 6:40 p.m. UTC | #2
On Mon, May 24, 2021 at 03:43:11PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 06.05.2021 21:13, Matthew Brost wrote:
> > With the introduction of non-blocking CTBs more than one CTB can be in
> > flight at a time. Increasing the size of the CTBs should reduce how
> > often software hits the case where no space is available in the CTB
> > buffer.
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 77dfbc94dcc3..d6895d29ed2d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >   *      +--------+-----------------------------------------------+------+
> >   *
> >   * Size of each `CT Buffer`_ must be multiple of 4K.
> > - * As we don't expect too many messages, for now use minimum sizes.
> > + * We don't expect too many messages in flight at any time, unless we are
> > + * using the GuC submission. In that case each request requires a minimum
> > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> 
> nit: all our CTB calculations are in dwords now, not bytes
> 

I can change the wording to DW sizes.

> > + * enough space to avoid backpressure on the driver. We increase the size
> > + * of the receive buffer (relative to the send) to ensure a G2H response
> > + * CTB has a landing spot.
> 
> hmm, but we are not checking G2H CTB yet
> will start doing it around patch 54/97
> so maybe this other patch should be introduced earlier ?
>

Yes, that patch is going to be pulled down to an earlier spot in the
series.
 
> >   */
> >  #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> >  #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> 
> in theory, we (host) should be faster than GuC, so G2H CTB shall be
> almost always empty, if this is not a case, maybe we should start
> monitoring what is happening and report some warnings if G2H is half full ?
>

Certainly some IGTs put some more pressure on the G2H channel than the
H2G channel at least I think. This is something we can tune over time
after this lands upstream. IMO a message at this point is overkill.

Matt
 
> >  
> >  #define MAX_US_STALL_CTB	1000000
> >  
> > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >  	/* beware of buffer wrap case */
> >  	if (unlikely(available < 0))
> >  		available += size;
> > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> >  	GEM_BUG_ON(available < 0);
> >  
> >  	header = cmds[head];
> >
Tvrtko Ursulin May 25, 2021, 9:24 a.m. UTC | #3
On 06/05/2021 20:13, Matthew Brost wrote:
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.

I'd move this before the patch which adds the non-blocking send since 
that one claims congestion should be rare with properly sized buffers. 
So it makes sense to have them sized properly back before that one.

Regards,

Tvrtko

> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 77dfbc94dcc3..d6895d29ed2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>    *      +--------+-----------------------------------------------+------+
>    *
>    * Size of each `CT Buffer`_ must be multiple of 4K.
> - * As we don't expect too many messages, for now use minimum sizes.
> + * We don't expect too many messages in flight at any time, unless we are
> + * using the GuC submission. In that case each request requires a minimum
> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> + * enough space to avoid backpressure on the driver. We increase the size
> + * of the receive buffer (relative to the send) to ensure a G2H response
> + * CTB has a landing spot.
>    */
>   #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>   #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>   
>   #define MAX_US_STALL_CTB	1000000
>   
> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	/* beware of buffer wrap case */
>   	if (unlikely(available < 0))
>   		available += size;
> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>   	GEM_BUG_ON(available < 0);
>   
>   	header = cmds[head];
>
Matthew Brost May 25, 2021, 5:15 p.m. UTC | #4
On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/05/2021 20:13, Matthew Brost wrote:
> > With the introduction of non-blocking CTBs more than one CTB can be in
> > flight at a time. Increasing the size of the CTBs should reduce how
> > often software hits the case where no space is available in the CTB
> > buffer.
> 
> I'd move this before the patch which adds the non-blocking send since that
> one claims congestion should be rare with properly sized buffers. So it
> makes sense to have them sized properly back before that one.
>

IMO patch ordering is a bit of bikeshed. All these CTBs changes required
for GuC submission (34-40, 54) will get posted its own series and get
merged together. None of the individual patches break anything or is any
of this code really used until GuC submission is turned on. I can move
this when I post these patches by themselves but I just don't really see
the point either way.

Matt
 
> Regards,
> 
> Tvrtko
> 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> >   1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 77dfbc94dcc3..d6895d29ed2d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> >    *      +--------+-----------------------------------------------+------+
> >    *
> >    * Size of each `CT Buffer`_ must be multiple of 4K.
> > - * As we don't expect too many messages, for now use minimum sizes.
> > + * We don't expect too many messages in flight at any time, unless we are
> > + * using the GuC submission. In that case each request requires a minimum
> > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> > + * enough space to avoid backpressure on the driver. We increase the size
> > + * of the receive buffer (relative to the send) to ensure a G2H response
> > + * CTB has a landing spot.
> >    */
> >   #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> >   #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> >   #define MAX_US_STALL_CTB	1000000
> > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> >   	/* beware of buffer wrap case */
> >   	if (unlikely(available < 0))
> >   		available += size;
> > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> >   	GEM_BUG_ON(available < 0);
> >   	header = cmds[head];
> >
Tvrtko Ursulin May 26, 2021, 9:30 a.m. UTC | #5
On 25/05/2021 18:15, Matthew Brost wrote:
> On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/05/2021 20:13, Matthew Brost wrote:
>>> With the introduction of non-blocking CTBs more than one CTB can be in
>>> flight at a time. Increasing the size of the CTBs should reduce how
>>> often software hits the case where no space is available in the CTB
>>> buffer.
>>
>> I'd move this before the patch which adds the non-blocking send since that
>> one claims congestion should be rare with properly sized buffers. So it
>> makes sense to have them sized properly back before that one.
>>
> 
> IMO patch ordering is a bit of bikeshed. All these CTBs changes required
> for GuC submission (34-40, 54) will get posted its own series and get
> merged together. None of the individual patches break anything or is any
> of this code really used until GuC submission is turned on. I can move
> this when I post these patches by themselves but I just don't really see
> the point either way.

As a general principle we do try to have work in the order which makes 
sense functionality wise.

That includes trying to avoid adding and then removing, or changing a 
lot, the same code within the series. And also adding functionality 
which is known to not work completely well until later in the series.

With a master switch at the end of series you can sometimes get away 
with it, but if nothing else it at least makes it much easier to read if 
things are flowing in the expected way within (the series).

In this particular example sizing the buffers appropriately before 
starting to use the facility a lot more certainly sounds like a no 
brainer to me, especially since the patch is so trivial to move conflict 
wise.

Regards,

Tvrtko

> Matt
>   
>> Regards,
>>
>> Tvrtko
>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
>>>    1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 77dfbc94dcc3..d6895d29ed2d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
>>>     *      +--------+-----------------------------------------------+------+
>>>     *
>>>     * Size of each `CT Buffer`_ must be multiple of 4K.
>>> - * As we don't expect too many messages, for now use minimum sizes.
>>> + * We don't expect too many messages in flight at any time, unless we are
>>> + * using the GuC submission. In that case each request requires a minimum
>>> + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
>>> + * enough space to avoid backpressure on the driver. We increase the size
>>> + * of the receive buffer (relative to the send) to ensure a G2H response
>>> + * CTB has a landing spot.
>>>     */
>>>    #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>>>    #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
>>> -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
>>> +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>>>    #define MAX_US_STALL_CTB	1000000
>>> @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>>    	/* beware of buffer wrap case */
>>>    	if (unlikely(available < 0))
>>>    		available += size;
>>> -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
>>> +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
>>>    	GEM_BUG_ON(available < 0);
>>>    	header = cmds[head];
>>>
Matthew Brost May 26, 2021, 6:20 p.m. UTC | #6
On Wed, May 26, 2021 at 10:30:27AM +0100, Tvrtko Ursulin wrote:
> 
> On 25/05/2021 18:15, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 10:24:09AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 06/05/2021 20:13, Matthew Brost wrote:
> > > > With the introduction of non-blocking CTBs more than one CTB can be in
> > > > flight at a time. Increasing the size of the CTBs should reduce how
> > > > often software hits the case where no space is available in the CTB
> > > > buffer.
> > > 
> > > I'd move this before the patch which adds the non-blocking send since that
> > > one claims congestion should be rare with properly sized buffers. So it
> > > makes sense to have them sized properly back before that one.
> > > 
> > 
> > IMO patch ordering is a bit of bikeshed. All these CTBs changes required
> > for GuC submission (34-40, 54) will get posted its own series and get
> > merged together. None of the individual patches break anything or is any
> > of this code really used until GuC submission is turned on. I can move
> > this when I post these patches by themselves but I just don't really see
> > the point either way.
> 
> As a general principle we do try to have work in the order which makes sense
> functionality wise.
> 
> That includes trying to avoid adding and then removing, or changing a lot,
> the same code within the series. And also adding functionality which is
> known to not work completely well until later in the series.
> 
> With a master switch at the end of series you can sometimes get away with
> it, but if nothing else it at least makes it much easier to read if things
> are flowing in the expected way within (the series).
> 
> In this particular example sizing the buffers appropriately before starting
> to use the facility a lot more certainly sounds like a no brainer to me,
> especially since the patch is so trivial to move conflict wise.
> 

Fair enough. I'll reorder these patches when I do a post to merge these
ones.

Matt

> Regards,
> 
> Tvrtko
> 
> > Matt
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > Cc: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++---
> > > >    1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > index 77dfbc94dcc3..d6895d29ed2d 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > > @@ -63,11 +63,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
> > > >     *      +--------+-----------------------------------------------+------+
> > > >     *
> > > >     * Size of each `CT Buffer`_ must be multiple of 4K.
> > > > - * As we don't expect too many messages, for now use minimum sizes.
> > > > + * We don't expect too many messages in flight at any time, unless we are
> > > > + * using the GuC submission. In that case each request requires a minimum
> > > > + * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
> > > > + * enough space to avoid backpressure on the driver. We increase the size
> > > > + * of the receive buffer (relative to the send) to ensure a G2H response
> > > > + * CTB has a landing spot.
> > > >     */
> > > >    #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
> > > >    #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> > > > -#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
> > > > +#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
> > > >    #define MAX_US_STALL_CTB	1000000
> > > > @@ -753,7 +758,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
> > > >    	/* beware of buffer wrap case */
> > > >    	if (unlikely(available < 0))
> > > >    		available += size;
> > > > -	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
> > > > +	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
> > > >    	GEM_BUG_ON(available < 0);
> > > >    	header = cmds[head];
> > > >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 77dfbc94dcc3..d6895d29ed2d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -63,11 +63,16 @@  static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
  *      +--------+-----------------------------------------------+------+
  *
  * Size of each `CT Buffer`_ must be multiple of 4K.
- * As we don't expect too many messages, for now use minimum sizes.
+ * We don't expect too many messages in flight at any time, unless we are
+ * using the GuC submission. In that case each request requires a minimum
+ * 16 bytes which gives us a maximum 256 queue'd requests. Hopefully this
+ * enough space to avoid backpressure on the driver. We increase the size
+ * of the receive buffer (relative to the send) to ensure a G2H response
+ * CTB has a landing spot.
  */
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
-#define CTB_G2H_BUFFER_SIZE	(SZ_4K)
+#define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
 
 #define MAX_US_STALL_CTB	1000000
 
@@ -753,7 +758,7 @@  static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
 	GEM_BUG_ON(available < 0);
 
 	header = cmds[head];