diff mbox series

[net] gve: Fix an edge case for TSO skb validity check

Message ID 20240718190221.2219835-1-pkaligineedi@google.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] gve: Fix an edge case for TSO skb validity check | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 661 this patch: 661
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: rushilg@google.com
netdev/build_clang success Errors and warnings before: 662 this patch: 662
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 662 this patch: 662
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 43 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-19--12-00 (tests: 699)

Commit Message

Praveen Kaligineedi July 18, 2024, 7:02 p.m. UTC
From: Bailey Forrest <bcf@google.com>

The NIC requires each TSO segment to not span more than 10
descriptors. gve_can_send_tso() performs this check. However,
the current code misses an edge case when a TSO skb has a large
frag that needs to be split into multiple descriptors, causing
the 10 descriptor limit per TSO-segment to be exceeded. This
change fixes the edge case.

Fixes: a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
---
 drivers/net/ethernet/google/gve/gve_tx_dqo.c | 22 +++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn July 18, 2024, 11:07 p.m. UTC | #1
Praveen Kaligineedi wrote:
> From: Bailey Forrest <bcf@google.com>
> 
> The NIC requires each TSO segment to not span more than 10
> descriptors. gve_can_send_tso() performs this check. However,
> the current code misses an edge case when a TSO skb has a large
> frag that needs to be split into multiple descriptors

because each descriptor may not exceed 16KB (GVE_TX_MAX_BUF_SIZE_DQO)

>, causing
> the 10 descriptor limit per TSO-segment to be exceeded. This
> change fixes the edge case.
> 
> Fixes: a57e5de476be ("gve: DQO: Add TX path")
> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
> Signed-off-by: Bailey Forrest <bcf@google.com>
> Reviewed-by: Jeroen de Borst <jeroendb@google.com>
> ---
>  drivers/net/ethernet/google/gve/gve_tx_dqo.c | 22 +++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
> index 0b3cca3fc792..dc39dc481f21 100644
> --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
> +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
> @@ -866,22 +866,42 @@ static bool gve_can_send_tso(const struct sk_buff *skb)
>  	const int header_len = skb_tcp_all_headers(skb);
>  	const int gso_size = shinfo->gso_size;
>  	int cur_seg_num_bufs;
> +	int last_frag_size;

nit: last_frag can be interpreted as frags[nr_frags - 1], perhaps prev_frag.

>  	int cur_seg_size;
>  	int i;
>  
>  	cur_seg_size = skb_headlen(skb) - header_len;
> +	last_frag_size = skb_headlen(skb);
>  	cur_seg_num_bufs = cur_seg_size > 0;
>  
>  	for (i = 0; i < shinfo->nr_frags; i++) {
>  		if (cur_seg_size >= gso_size) {
>  			cur_seg_size %= gso_size;
>  			cur_seg_num_bufs = cur_seg_size > 0;
> +
> +			/* If the last buffer is split in the middle of a TSO

s/buffer/frag?

> +			 * segment, then it will count as two descriptors.
> +			 */
> +			if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> +				int last_frag_remain = last_frag_size %
> +					GVE_TX_MAX_BUF_SIZE_DQO;
> +
> +				/* If the last frag was evenly divisible by
> +				 * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> +				 * split in the current segment.

Is this true even if the segment did not start at the start of the frag?

Overall, it's not trivial to follow. Probably because the goal is to
count max descriptors per segment, but that is not what is being
looped over.

Intuitive (perhaps buggy, a quick sketch), this is what is intended,
right?

static bool gve_can_send_tso(const struct sk_buff *skb)
{
        int frag_size = skb_headlen(skb) - header_len;
        int gso_size_left;
        int frag_idx = 0;
        int num_desc;
        int desc_len;
        int off = 0;

        while (off < skb->len) {
                gso_size_left = shinfo->gso_size;
                num_desc = 0;

                while (gso_size_left) {
                        desc_len = min(gso_size_left, frag_size);
                        gso_size_left -= desc_len;
                        frag_size -= desc_len;
                        num_desc++;

                        if (num_desc > max_descs_per_seg)
                                return false;

                        if (!frag_size)
                                frag_size = skb_frag_size(&shinfo->frags[frag_idx++]);
                }
        }

        return true;
}

This however loops skb->len / gso_size. While the above modulo
operation skips many segments that span a frag. Not sure if the more
intuitive approach could be as performant.

Else, I'll stare some more at the suggested patch to convince myself
that it is correct and complete..

> +                              */


> +				 */
> +				if (last_frag_remain &&
> +				    cur_seg_size > last_frag_remain) {
> +					cur_seg_num_bufs++;
> +				}
> +			}
>  		}
>  
>  		if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg))
>  			return false;
>  
> -		cur_seg_size += skb_frag_size(&shinfo->frags[i]);
> +		last_frag_size = skb_frag_size(&shinfo->frags[i]);
> +		cur_seg_size += last_frag_size;
>  	}
>  
>  	return true;
> -- 
> 2.45.2.993.g49e7a77208-goog
>
Bailey Forrest July 19, 2024, 12:27 a.m. UTC | #2
On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> This however loops skb->len / gso_size. While the above modulo
> operation skips many segments that span a frag. Not sure if the more
> intuitive approach could be as performant.

Yes, the original intention of the code was to loop over nr_frags,
instead of (skb->len / gso_size).

But perhaps that's premature optimization if it makes the code
significantly harder to follow.
Praveen Kaligineedi July 19, 2024, 1:51 a.m. UTC | #3
On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:

> > +                      * segment, then it will count as two descriptors.
> > +                      */
> > +                     if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> > +                             int last_frag_remain = last_frag_size %
> > +                                     GVE_TX_MAX_BUF_SIZE_DQO;
> > +
> > +                             /* If the last frag was evenly divisible by
> > +                              * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > +                              * split in the current segment.
>
> Is this true even if the segment did not start at the start of the frag?
The comment probably is a bit confusing here. The current segment
we are tracking could have a portion in the previous frag. The code
assumed that the portion on the previous frag (if present) mapped to only
one descriptor. However, that portion could have been split across two
descriptors due to the restriction that each descriptor cannot exceed 16KB.
That's the case this fix is trying to address.
I will work on simplifying the logic based on your suggestion below so
that the fix is easier to follow
>
> Overall, it's not trivial to follow. Probably because the goal is to
> count max descriptors per segment, but that is not what is being
> looped over.
>
The comment
> Intuitive (perhaps buggy, a quick sketch), this is what is intended,
> right?
>
> static bool gve_can_send_tso(const struct sk_buff *skb)
> {
>         int frag_size = skb_headlen(skb) - header_len;
>         int gso_size_left;
>         int frag_idx = 0;
>         int num_desc;
>         int desc_len;
>         int off = 0;
>
>         while (off < skb->len) {
>                 gso_size_left = shinfo->gso_size;
>                 num_desc = 0;
>
>                 while (gso_size_left) {
>                         desc_len = min(gso_size_left, frag_size);
>                         gso_size_left -= desc_len;
>                         frag_size -= desc_len;
>                         num_desc++;
>
>                         if (num_desc > max_descs_per_seg)
>                                 return false;
>
>                         if (!frag_size)
>                                 frag_size = skb_frag_size(&shinfo->frags[frag_idx++]);
>                 }
>         }
>
>         return true;
> }
>
> This however loops skb->len / gso_size. While the above modulo
> operation skips many segments that span a frag. Not sure if the more
> intuitive approach could be as performant.
>
> Else, I'll stare some more at the suggested patch to convince myself
> that it is correct and complete..
Willem de Bruijn July 19, 2024, 3:24 a.m. UTC | #4
On Thu, Jul 18, 2024 at 8:28 PM Bailey Forrest <bcf@google.com> wrote:
>
> On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > This however loops skb->len / gso_size. While the above modulo
> > operation skips many segments that span a frag. Not sure if the more
> > intuitive approach could be as performant.
>
> Yes, the original intention of the code was to loop over nr_frags,
> instead of (skb->len / gso_size).
>
> But perhaps that's premature optimization if it makes the code
> significantly harder to follow.

Thanks. I don't mean to ask for a wholesale rewrite if not needed.

But perhaps the logic can be explained in the commit in a way
that it is more immediately obvious.

Praveen suggested that. I'll respond to his reply in more detail.
Willem de Bruijn July 19, 2024, 3:46 a.m. UTC | #5
On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi
<pkaligineedi@google.com> wrote:
>
> On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>
> > > +                      * segment, then it will count as two descriptors.
> > > +                      */
> > > +                     if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> > > +                             int last_frag_remain = last_frag_size %
> > > +                                     GVE_TX_MAX_BUF_SIZE_DQO;
> > > +
> > > +                             /* If the last frag was evenly divisible by
> > > +                              * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > > +                              * split in the current segment.
> >
> > Is this true even if the segment did not start at the start of the frag?
> The comment probably is a bit confusing here. The current segment
> we are tracking could have a portion in the previous frag. The code
> assumed that the portion on the previous frag (if present) mapped to only
> one descriptor. However, that portion could have been split across two
> descriptors due to the restriction that each descriptor cannot exceed 16KB.

>>> /* If the last frag was evenly divisible by
>>> +                                * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
>>>  +                              * split in the current segment.

This is true because the smallest multiple of 16KB is 32KB, and the
largest gso_size at least for Ethernet will be 9K. But I don't think
that that is what is used here as the basis for this statement?

> That's the case this fix is trying to address.
> I will work on simplifying the logic based on your suggestion below so
> that the fix is easier to follow
Praveen Kaligineedi July 19, 2024, 2:31 p.m. UTC | #6
On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi
> <pkaligineedi@google.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> >
> > > > +                      * segment, then it will count as two descriptors.
> > > > +                      */
> > > > +                     if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> > > > +                             int last_frag_remain = last_frag_size %
> > > > +                                     GVE_TX_MAX_BUF_SIZE_DQO;
> > > > +
> > > > +                             /* If the last frag was evenly divisible by
> > > > +                              * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > > > +                              * split in the current segment.
> > >
> > > Is this true even if the segment did not start at the start of the frag?
> > The comment probably is a bit confusing here. The current segment
> > we are tracking could have a portion in the previous frag. The code
> > assumed that the portion on the previous frag (if present) mapped to only
> > one descriptor. However, that portion could have been split across two
> > descriptors due to the restriction that each descriptor cannot exceed 16KB.
>
> >>> /* If the last frag was evenly divisible by
> >>> +                                * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> >>>  +                              * split in the current segment.
>
> This is true because the smallest multiple of 16KB is 32KB, and the
> largest gso_size at least for Ethernet will be 9K. But I don't think
> that that is what is used here as the basis for this statement?
>
The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO
is an implicit assumption made in this patch and in that comment. Bailey,
please correct me if I am wrong..




> > That's the case this fix is trying to address.
> > I will work on simplifying the logic based on your suggestion below so
> > that the fix is easier to follow
Bailey Forrest July 19, 2024, 4:55 p.m. UTC | #7
On Fri, Jul 19, 2024 at 7:31 AM Praveen Kaligineedi
<pkaligineedi@google.com> wrote:
>
> On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi
> > <pkaligineedi@google.com> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > > > +                      * segment, then it will count as two descriptors.
> > > > > +                      */
> > > > > +                     if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> > > > > +                             int last_frag_remain = last_frag_size %
> > > > > +                                     GVE_TX_MAX_BUF_SIZE_DQO;
> > > > > +
> > > > > +                             /* If the last frag was evenly divisible by
> > > > > +                              * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > > > > +                              * split in the current segment.
> > > >
> > > > Is this true even if the segment did not start at the start of the frag?
> > > The comment probably is a bit confusing here. The current segment
> > > we are tracking could have a portion in the previous frag. The code
> > > assumed that the portion on the previous frag (if present) mapped to only
> > > one descriptor. However, that portion could have been split across two
> > > descriptors due to the restriction that each descriptor cannot exceed 16KB.
> >
> > >>> /* If the last frag was evenly divisible by
> > >>> +                                * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > >>>  +                              * split in the current segment.
> >
> > This is true because the smallest multiple of 16KB is 32KB, and the
> > largest gso_size at least for Ethernet will be 9K. But I don't think
> > that that is what is used here as the basis for this statement?
> >
> The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO
> is an implicit assumption made in this patch and in that comment. Bailey,
> please correct me if I am wrong..

If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it
doesn't hit the edge case we're looking for.

- If it's evenly divisible, then we know it will use exactly
(last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors
- GVE_TX_MAX_BUF_SIZE_DQO > 9k, so we know each descriptor won't
create a segment which exceeds the limit
Willem de Bruijn July 21, 2024, 7:10 p.m. UTC | #8
On Fri, Jul 19, 2024 at 9:56 AM Bailey Forrest <bcf@google.com> wrote:
>
> On Fri, Jul 19, 2024 at 7:31 AM Praveen Kaligineedi
> <pkaligineedi@google.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi
> > > <pkaligineedi@google.com> wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > > > +                      * segment, then it will count as two descriptors.
> > > > > > +                      */
> > > > > > +                     if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
> > > > > > +                             int last_frag_remain = last_frag_size %
> > > > > > +                                     GVE_TX_MAX_BUF_SIZE_DQO;
> > > > > > +
> > > > > > +                             /* If the last frag was evenly divisible by
> > > > > > +                              * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > > > > > +                              * split in the current segment.
> > > > >
> > > > > Is this true even if the segment did not start at the start of the frag?
> > > > The comment probably is a bit confusing here. The current segment
> > > > we are tracking could have a portion in the previous frag. The code
> > > > assumed that the portion on the previous frag (if present) mapped to only
> > > > one descriptor. However, that portion could have been split across two
> > > > descriptors due to the restriction that each descriptor cannot exceed 16KB.
> > >
> > > >>> /* If the last frag was evenly divisible by
> > > >>> +                                * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
> > > >>>  +                              * split in the current segment.
> > >
> > > This is true because the smallest multiple of 16KB is 32KB, and the
> > > largest gso_size at least for Ethernet will be 9K. But I don't think
> > > that that is what is used here as the basis for this statement?
> > >
> > The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO
> > is an implicit assumption made in this patch and in that comment. Bailey,
> > please correct me if I am wrong..
>
> If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it
> doesn't hit the edge case we're looking for.
>
> - If it's evenly divisible, then we know it will use exactly
> (last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors

This assumes that gso_segment start is aligned with skb_frag
start. That is not necessarily true, right?

If headlen minus protocol headers is 1B, then the first segment
will have two descriptors { 1B, 9KB - 1 }. And the next segment
can have skb_frag_size - ( 9KB - 1).

I think the statement is correct, but because every multiple
of 16KB is so much larger than the max gso_size of ~9KB,
that a single segment will never include more than two
skb_frags.

Quite possibly the code overestimates the number of
descriptors per segment now, but that is safe and only a
performance regression.

> - GVE_TX_MAX_BUF_SIZE_DQO > 9k, so we know each descriptor won't
> create a segment which exceeds the limit

For a net patch, it is generally better to make a small fix rather than rewrite.

That said, my sketch without looping over every segment:

        while (off < skb->len) {
                gso_size_left = shinfo->gso_size;
                num_desc = 0;

                while (gso_size_left) {
                        desc_len = min(gso_size_left, frag_size_left);
                        gso_size_left -= desc_len;
                        frag_size_left -= desc_len;
                        num_desc++;

                        if (num_desc > max_descs_per_seg)
                                return false;

                        if (!frag_size_left)
                                frag_size_left =
skb_frag_size(&shinfo->frags[frag_idx++]);
+                      else
+                              frag_size_left %= gso_size;        /*
skip segments that fit in one desc */
                }
        }
Paolo Abeni July 23, 2024, 10:20 a.m. UTC | #9
On 7/21/24 21:10, Willem de Bruijn wrote:
> On Fri, Jul 19, 2024 at 9:56 AM Bailey Forrest <bcf@google.com> wrote:
>> If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it
>> doesn't hit the edge case we're looking for.
>>
>> - If it's evenly divisible, then we know it will use exactly
>> (last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors
> 
> This assumes that gso_segment start is aligned with skb_frag
> start. That is not necessarily true, right?
> 
> If headlen minus protocol headers is 1B, then the first segment
> will have two descriptors { 1B, 9KB - 1 }. And the next segment
> can have skb_frag_size - ( 9KB - 1).
> 
> I think the statement is correct, but because every multiple
> of 16KB is so much larger than the max gso_size of ~9KB,
> that a single segment will never include more than two
> skb_frags.

I'm a bit lost, but what abut big TCP? that should allow (considerably) 
more than 2 frags 16K each per segment.

In any case it looks like this needs at least some comments clarification.

Thanks,

Paolo
diff mbox series

Patch

diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
index 0b3cca3fc792..dc39dc481f21 100644
--- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
@@ -866,22 +866,42 @@  static bool gve_can_send_tso(const struct sk_buff *skb)
 	const int header_len = skb_tcp_all_headers(skb);
 	const int gso_size = shinfo->gso_size;
 	int cur_seg_num_bufs;
+	int last_frag_size;
 	int cur_seg_size;
 	int i;
 
 	cur_seg_size = skb_headlen(skb) - header_len;
+	last_frag_size = skb_headlen(skb);
 	cur_seg_num_bufs = cur_seg_size > 0;
 
 	for (i = 0; i < shinfo->nr_frags; i++) {
 		if (cur_seg_size >= gso_size) {
 			cur_seg_size %= gso_size;
 			cur_seg_num_bufs = cur_seg_size > 0;
+
+			/* If the last buffer is split in the middle of a TSO
+			 * segment, then it will count as two descriptors.
+			 */
+			if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) {
+				int last_frag_remain = last_frag_size %
+					GVE_TX_MAX_BUF_SIZE_DQO;
+
+				/* If the last frag was evenly divisible by
+				 * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be
+				 * split in the current segment.
+				 */
+				if (last_frag_remain &&
+				    cur_seg_size > last_frag_remain) {
+					cur_seg_num_bufs++;
+				}
+			}
 		}
 
 		if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg))
 			return false;
 
-		cur_seg_size += skb_frag_size(&shinfo->frags[i]);
+		last_frag_size = skb_frag_size(&shinfo->frags[i]);
+		cur_seg_size += last_frag_size;
 	}
 
 	return true;