Message ID | 20240718190221.2219835-1-pkaligineedi@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] gve: Fix an edge case for TSO skb validity check | expand |
Praveen Kaligineedi wrote: > From: Bailey Forrest <bcf@google.com> > > The NIC requires each TSO segment to not span more than 10 > descriptors. gve_can_send_tso() performs this check. However, > the current code misses an edge case when a TSO skb has a large > frag that needs to be split into multiple descriptors because each descriptor may not exceed 16KB (GVE_TX_MAX_BUF_SIZE_DQO) >, causing > the 10 descriptor limit per TSO-segment to be exceeded. This > change fixes the edge case. > > Fixes: a57e5de476be ("gve: DQO: Add TX path") > Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> > Signed-off-by: Bailey Forrest <bcf@google.com> > Reviewed-by: Jeroen de Borst <jeroendb@google.com> > --- > drivers/net/ethernet/google/gve/gve_tx_dqo.c | 22 +++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c > index 0b3cca3fc792..dc39dc481f21 100644 > --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c > +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c > @@ -866,22 +866,42 @@ static bool gve_can_send_tso(const struct sk_buff *skb) > const int header_len = skb_tcp_all_headers(skb); > const int gso_size = shinfo->gso_size; > int cur_seg_num_bufs; > + int last_frag_size; nit: last_frag can be interpreted as frags[nr_frags - 1], perhaps prev_frag. > int cur_seg_size; > int i; > > cur_seg_size = skb_headlen(skb) - header_len; > + last_frag_size = skb_headlen(skb); > cur_seg_num_bufs = cur_seg_size > 0; > > for (i = 0; i < shinfo->nr_frags; i++) { > if (cur_seg_size >= gso_size) { > cur_seg_size %= gso_size; > cur_seg_num_bufs = cur_seg_size > 0; > + > + /* If the last buffer is split in the middle of a TSO s/buffer/frag? > + * segment, then it will count as two descriptors. > + */ > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > + int last_frag_remain = last_frag_size % > + GVE_TX_MAX_BUF_SIZE_DQO; > + > + /* If the last frag was evenly divisible by > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > + * split in the current segment. Is this true even if the segment did not start at the start of the frag? Overall, it's not trivial to follow. Probably because the goal is to count max descriptors per segment, but that is not what is being looped over. Intuitive (perhaps buggy, a quick sketch), this is what is intended, right? static bool gve_can_send_tso(const struct sk_buff *skb) { int frag_size = skb_headlen(skb) - header_len; int gso_size_left; int frag_idx = 0; int num_desc; int desc_len; int off = 0; while (off < skb->len) { gso_size_left = shinfo->gso_size; num_desc = 0; while (gso_size_left) { desc_len = min(gso_size_left, frag_size); gso_size_left -= desc_len; frag_size -= desc_len; num_desc++; if (num_desc > max_descs_per_seg) return false; if (!frag_size) frag_size = skb_frag_size(&shinfo->frags[frag_idx++]); } } return true; } This however loops skb->len / gso_size. While the above modulo operation skips many segments that span a frag. Not sure if the more intuitive approach could be as performant. Else, I'll stare some more at the suggested patch to convince myself that it is correct and complete.. > + */ > + */ > + if (last_frag_remain && > + cur_seg_size > last_frag_remain) { > + cur_seg_num_bufs++; > + } > + } > } > > if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg)) > return false; > > - cur_seg_size += skb_frag_size(&shinfo->frags[i]); > + last_frag_size = skb_frag_size(&shinfo->frags[i]); > + cur_seg_size += last_frag_size; > } > > return true; > -- > 2.45.2.993.g49e7a77208-goog >
On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > This however loops skb->len / gso_size. While the above modulo > operation skips many segments that span a frag. Not sure if the more > intuitive approach could be as performant. Yes, the original intention of the code was to loop over nr_frags, instead of (skb->len / gso_size). But perhaps that's premature optimization if it makes the code significantly harder to follow.
On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > + * segment, then it will count as two descriptors. > > + */ > > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > > + int last_frag_remain = last_frag_size % > > + GVE_TX_MAX_BUF_SIZE_DQO; > > + > > + /* If the last frag was evenly divisible by > > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > + * split in the current segment. > > Is this true even if the segment did not start at the start of the frag? The comment probably is a bit confusing here. The current segment we are tracking could have a portion in the previous frag. The code assumed that the portion on the previous frag (if present) mapped to only one descriptor. However, that portion could have been split across two descriptors due to the restriction that each descriptor cannot exceed 16KB. That's the case this fix is trying to address. I will work on simplifying the logic based on your suggestion below so that the fix is easier to follow > > Overall, it's not trivial to follow. Probably because the goal is to > count max descriptors per segment, but that is not what is being > looped over. > The comment > Intuitive (perhaps buggy, a quick sketch), this is what is intended, > right? > > static bool gve_can_send_tso(const struct sk_buff *skb) > { > int frag_size = skb_headlen(skb) - header_len; > int gso_size_left; > int frag_idx = 0; > int num_desc; > int desc_len; > int off = 0; > > while (off < skb->len) { > gso_size_left = shinfo->gso_size; > num_desc = 0; > > while (gso_size_left) { > desc_len = min(gso_size_left, frag_size); > gso_size_left -= desc_len; > frag_size -= desc_len; > num_desc++; > > if (num_desc > max_descs_per_seg) > return false; > > if (!frag_size) > frag_size = skb_frag_size(&shinfo->frags[frag_idx++]); > } > } > > return true; > } > > This however loops skb->len / gso_size. While the above modulo > operation skips many segments that span a frag. Not sure if the more > intuitive approach could be as performant. > > Else, I'll stare some more at the suggested patch to convince myself > that it is correct and complete..
On Thu, Jul 18, 2024 at 8:28 PM Bailey Forrest <bcf@google.com> wrote: > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > This however loops skb->len / gso_size. While the above modulo > > operation skips many segments that span a frag. Not sure if the more > > intuitive approach could be as performant. > > Yes, the original intention of the code was to loop over nr_frags, > instead of (skb->len / gso_size). > > But perhaps that's premature optimization if it makes the code > significantly harder to follow. Thanks. I don't mean to ask for a wholesale rewrite if not needed. But perhaps the logic can be explained in the commit in a way that it is more immediately obvious. Praveen suggested that. I'll respond to his reply in more detail.
On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi <pkaligineedi@google.com> wrote: > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > + * segment, then it will count as two descriptors. > > > + */ > > > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > > > + int last_frag_remain = last_frag_size % > > > + GVE_TX_MAX_BUF_SIZE_DQO; > > > + > > > + /* If the last frag was evenly divisible by > > > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > > + * split in the current segment. > > > > Is this true even if the segment did not start at the start of the frag? > The comment probably is a bit confusing here. The current segment > we are tracking could have a portion in the previous frag. The code > assumed that the portion on the previous frag (if present) mapped to only > one descriptor. However, that portion could have been split across two > descriptors due to the restriction that each descriptor cannot exceed 16KB. >>> /* If the last frag was evenly divisible by >>> + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be >>> + * split in the current segment. This is true because the smallest multiple of 16KB is 32KB, and the largest gso_size at least for Ethernet will be 9K. But I don't think that that is what is used here as the basis for this statement? > That's the case this fix is trying to address. > I will work on simplifying the logic based on your suggestion below so > that the fix is easier to follow
On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi > <pkaligineedi@google.com> wrote: > > > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > + * segment, then it will count as two descriptors. > > > > + */ > > > > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > > > > + int last_frag_remain = last_frag_size % > > > > + GVE_TX_MAX_BUF_SIZE_DQO; > > > > + > > > > + /* If the last frag was evenly divisible by > > > > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > > > + * split in the current segment. > > > > > > Is this true even if the segment did not start at the start of the frag? > > The comment probably is a bit confusing here. The current segment > > we are tracking could have a portion in the previous frag. The code > > assumed that the portion on the previous frag (if present) mapped to only > > one descriptor. However, that portion could have been split across two > > descriptors due to the restriction that each descriptor cannot exceed 16KB. > > >>> /* If the last frag was evenly divisible by > >>> + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > >>> + * split in the current segment. > > This is true because the smallest multiple of 16KB is 32KB, and the > largest gso_size at least for Ethernet will be 9K. But I don't think > that that is what is used here as the basis for this statement? > The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO is an implicit assumption made in this patch and in that comment. Bailey, please correct me if I am wrong.. > > That's the case this fix is trying to address. > > I will work on simplifying the logic based on your suggestion below so > > that the fix is easier to follow
On Fri, Jul 19, 2024 at 7:31 AM Praveen Kaligineedi <pkaligineedi@google.com> wrote: > > On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi > > <pkaligineedi@google.com> wrote: > > > > > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > + * segment, then it will count as two descriptors. > > > > > + */ > > > > > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > > > > > + int last_frag_remain = last_frag_size % > > > > > + GVE_TX_MAX_BUF_SIZE_DQO; > > > > > + > > > > > + /* If the last frag was evenly divisible by > > > > > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > > > > + * split in the current segment. > > > > > > > > Is this true even if the segment did not start at the start of the frag? > > > The comment probably is a bit confusing here. The current segment > > > we are tracking could have a portion in the previous frag. The code > > > assumed that the portion on the previous frag (if present) mapped to only > > > one descriptor. However, that portion could have been split across two > > > descriptors due to the restriction that each descriptor cannot exceed 16KB. > > > > >>> /* If the last frag was evenly divisible by > > >>> + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > >>> + * split in the current segment. > > > > This is true because the smallest multiple of 16KB is 32KB, and the > > largest gso_size at least for Ethernet will be 9K. But I don't think > > that that is what is used here as the basis for this statement? > > > The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO > is an implicit assumption made in this patch and in that comment. Bailey, > please correct me if I am wrong.. If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it doesn't hit the edge case we're looking for. - If it's evenly divisible, then we know it will use exactly (last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors - GVE_TX_MAX_BUF_SIZE_DQO > 9k, so we know each descriptor won't create a segment which exceeds the limit
On Fri, Jul 19, 2024 at 9:56 AM Bailey Forrest <bcf@google.com> wrote: > > On Fri, Jul 19, 2024 at 7:31 AM Praveen Kaligineedi > <pkaligineedi@google.com> wrote: > > > > On Thu, Jul 18, 2024 at 8:47 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Thu, Jul 18, 2024 at 9:52 PM Praveen Kaligineedi > > > <pkaligineedi@google.com> wrote: > > > > > > > > On Thu, Jul 18, 2024 at 4:07 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > + * segment, then it will count as two descriptors. > > > > > > + */ > > > > > > + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { > > > > > > + int last_frag_remain = last_frag_size % > > > > > > + GVE_TX_MAX_BUF_SIZE_DQO; > > > > > > + > > > > > > + /* If the last frag was evenly divisible by > > > > > > + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > > > > > + * split in the current segment. > > > > > > > > > > Is this true even if the segment did not start at the start of the frag? > > > > The comment probably is a bit confusing here. The current segment > > > > we are tracking could have a portion in the previous frag. The code > > > > assumed that the portion on the previous frag (if present) mapped to only > > > > one descriptor. However, that portion could have been split across two > > > > descriptors due to the restriction that each descriptor cannot exceed 16KB. > > > > > > >>> /* If the last frag was evenly divisible by > > > >>> + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be > > > >>> + * split in the current segment. > > > > > > This is true because the smallest multiple of 16KB is 32KB, and the > > > largest gso_size at least for Ethernet will be 9K. But I don't think > > > that that is what is used here as the basis for this statement? > > > > > The largest Ethernet gso_size (9K) is less than GVE_TX_MAX_BUF_SIZE_DQO > > is an implicit assumption made in this patch and in that comment. Bailey, > > please correct me if I am wrong.. > > If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it > doesn't hit the edge case we're looking for. > > - If it's evenly divisible, then we know it will use exactly > (last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors This assumes that gso_segment start is aligned with skb_frag start. That is not necessarily true, right? If headlen minus protocol headers is 1B, then the first segment will have two descriptors { 1B, 9KB - 1 }. And the next segment can have skb_frag_size - ( 9KB - 1). I think the statement is correct, but because every multiple of 16KB is so much larger than the max gso_size of ~9KB, that a single segment will never include more than two skb_frags. Quite possibly the code overestimates the number of descriptors per segment now, but that is safe and only a performance regression. > - GVE_TX_MAX_BUF_SIZE_DQO > 9k, so we know each descriptor won't > create a segment which exceeds the limit For a net patch, it is generally better to make a small fix rather than rewrite. That said, my sketch without looping over every segment: while (off < skb->len) { gso_size_left = shinfo->gso_size; num_desc = 0; while (gso_size_left) { desc_len = min(gso_size_left, frag_size_left); gso_size_left -= desc_len; frag_size_left -= desc_len; num_desc++; if (num_desc > max_descs_per_seg) return false; if (!frag_size_left) frag_size_left = skb_frag_size(&shinfo->frags[frag_idx++]); + else + frag_size_left %= gso_size; /* skip segments that fit in one desc */ } }
On 7/21/24 21:10, Willem de Bruijn wrote: > On Fri, Jul 19, 2024 at 9:56 AM Bailey Forrest <bcf@google.com> wrote: >> If last_frag_size is evenly divisible by GVE_TX_MAX_BUF_SIZE_DQO, it >> doesn't hit the edge case we're looking for. >> >> - If it's evenly divisible, then we know it will use exactly >> (last_frag_size / GVE_TX_MAX_BUF_SIZE_DQO) descriptors > > This assumes that gso_segment start is aligned with skb_frag > start. That is not necessarily true, right? > > If headlen minus protocol headers is 1B, then the first segment > will have two descriptors { 1B, 9KB - 1 }. And the next segment > can have skb_frag_size - ( 9KB - 1). > > I think the statement is correct, but because every multiple > of 16KB is so much larger than the max gso_size of ~9KB, > that a single segment will never include more than two > skb_frags. I'm a bit lost, but what abut big TCP? that should allow (considerably) more than 2 frags 16K each per segment. In any case it looks like this needs at least some comments clarification. Thanks, Paolo
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c index 0b3cca3fc792..dc39dc481f21 100644 --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c @@ -866,22 +866,42 @@ static bool gve_can_send_tso(const struct sk_buff *skb) const int header_len = skb_tcp_all_headers(skb); const int gso_size = shinfo->gso_size; int cur_seg_num_bufs; + int last_frag_size; int cur_seg_size; int i; cur_seg_size = skb_headlen(skb) - header_len; + last_frag_size = skb_headlen(skb); cur_seg_num_bufs = cur_seg_size > 0; for (i = 0; i < shinfo->nr_frags; i++) { if (cur_seg_size >= gso_size) { cur_seg_size %= gso_size; cur_seg_num_bufs = cur_seg_size > 0; + + /* If the last buffer is split in the middle of a TSO + * segment, then it will count as two descriptors. + */ + if (last_frag_size > GVE_TX_MAX_BUF_SIZE_DQO) { + int last_frag_remain = last_frag_size % + GVE_TX_MAX_BUF_SIZE_DQO; + + /* If the last frag was evenly divisible by + * GVE_TX_MAX_BUF_SIZE_DQO, then it will not be + * split in the current segment. + */ + if (last_frag_remain && + cur_seg_size > last_frag_remain) { + cur_seg_num_bufs++; + } + } } if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg)) return false; - cur_seg_size += skb_frag_size(&shinfo->frags[i]); + last_frag_size = skb_frag_size(&shinfo->frags[i]); + cur_seg_size += last_frag_size; } return true;