diff mbox series

[net-next] tcp: ensure to use the most recently sent skb when filling the rate sample

Message ID 1649847244-5738-1-git-send-email-yangpc@wangsu.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] tcp: ensure to use the most recently sent skb when filling the rate sample | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers warning 5 maintainers not CCed: davem@davemloft.net pabeni@redhat.com yoshfuji@linux-ipv6.org dsahern@kernel.org kuba@kernel.org
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Pengcheng Yang April 13, 2022, 10:54 a.m. UTC
If an ACK (s)acks multiple skbs, we favor the information
from the most recently sent skb by choosing the skb with
the highest prior_delivered count.
But prior_delivered may be equal, because tp->delivered only
changes when receiving, which requires further comparison of
skb timestamp.

Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
---
 net/ipv4/tcp_rate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Neal Cardwell April 14, 2022, 4:11 p.m. UTC | #1
On Wed, Apr 13, 2022 at 6:54 AM Pengcheng Yang <yangpc@wangsu.com> wrote:
>
> If an ACK (s)acks multiple skbs, we favor the information
> from the most recently sent skb by choosing the skb with
> the highest prior_delivered count.
> But prior_delivered may be equal, because tp->delivered only
> changes when receiving, which requires further comparison of
> skb timestamp.
>
> Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> ---
>  net/ipv4/tcp_rate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
> index 617b818..ad893ad 100644
> --- a/net/ipv4/tcp_rate.c
> +++ b/net/ipv4/tcp_rate.c
> @@ -86,7 +86,9 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
>                 return;
>
>         if (!rs->prior_delivered ||
> -           after(scb->tx.delivered, rs->prior_delivered)) {
> +           after(scb->tx.delivered, rs->prior_delivered) ||
> +           (scb->tx.delivered == rs->prior_delivered &&
> +            tcp_skb_timestamp_us(skb) > tp->first_tx_mstamp)) {
>                 rs->prior_delivered_ce  = scb->tx.delivered_ce;
>                 rs->prior_delivered  = scb->tx.delivered;
>                 rs->prior_mstamp     = scb->tx.delivered_mstamp;
> --

Thank you for posting this patch! I agree there is a bug there, and
your patch is an improvement. However, I think this patch is not a
complete solution, since it does not handle the case where there are
multiple skbs with the tcp_skb_timestamp_us() (which can happen if a
outgoing buffered TSO/GSO skb is later split into multiple skbs with
the same timestamp).

RACK has to deal with the same question "which skb was sent first?",
and already has a solution in tcp_rack_sent_after(). I suggest we
share code between RACK and tcp_rate_skb_delivered() to make this
check. This might involve making a copy of tcp_rack_sent_after() in
include/net/tcp.h, naming the .h copy to tcp_skb_sent_after(), and
reworking this logic to save and use the sequence number and timestamp
so that it can use the new tcp_skb_sent_after() helper. After this fix
propagates to net-next we could later then change RACK to use the new
tcp_skb_sent_after() function, so we have a single helper used in two
places.

If you want to post a version of this patch that uses some approach
like that, IMHO that would be welcome. If you do not have cycles, I am
happy to post one when I get a moment.

thanks,
neal
Pengcheng Yang April 15, 2022, 4:44 p.m. UTC | #2
On Fri, Apr 15, 2022 at 12:12 AM Neal Cardwell <ncardwell@google.com> wrote:
> 
> On Wed, Apr 13, 2022 at 6:54 AM Pengcheng Yang <yangpc@wangsu.com> wrote:
> >
> > If an ACK (s)acks multiple skbs, we favor the information
> > from the most recently sent skb by choosing the skb with
> > the highest prior_delivered count.
> > But prior_delivered may be equal, because tp->delivered only
> > changes when receiving, which requires further comparison of
> > skb timestamp.
> >
> > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
> 
> Thank you for posting this patch! I agree there is a bug there, and
> your patch is an improvement. However, I think this patch is not a
> complete solution, since it does not handle the case where there are
> multiple skbs with the tcp_skb_timestamp_us() (which can happen if a
> outgoing buffered TSO/GSO skb is later split into multiple skbs with
> the same timestamp).
> 
My initial thought was that this case would not affect the correctness of rate_sample,
since the timestamp of these skbs are the same.
So I am confused whether we have to find the *real* most recently (with the highest seq) skb
(at the cost of a little extra cost)?

> RACK has to deal with the same question "which skb was sent first?",
> and already has a solution in tcp_rack_sent_after(). I suggest we
> share code between RACK and tcp_rate_skb_delivered() to make this
> check. This might involve making a copy of tcp_rack_sent_after() in
> include/net/tcp.h, naming the .h copy to tcp_skb_sent_after(), and
> reworking this logic to save and use the sequence number and timestamp
> so that it can use the new tcp_skb_sent_after() helper. After this fix
> propagates to net-next we could later then change RACK to use the new
> tcp_skb_sent_after() function, so we have a single helper used in two
> places.
> 

Ok. I will send the V2 later according to your suggestion. Thanks neal.

> If you want to post a version of this patch that uses some approach
> like that, IMHO that would be welcome. If you do not have cycles, I am
> happy to post one when I get a moment.
> 
> thanks,
> neal
diff mbox series

Patch

diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
index 617b818..ad893ad 100644
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -86,7 +86,9 @@  void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
 		return;
 
 	if (!rs->prior_delivered ||
-	    after(scb->tx.delivered, rs->prior_delivered)) {
+	    after(scb->tx.delivered, rs->prior_delivered) ||
+	    (scb->tx.delivered == rs->prior_delivered &&
+	     tcp_skb_timestamp_us(skb) > tp->first_tx_mstamp)) {
 		rs->prior_delivered_ce  = scb->tx.delivered_ce;
 		rs->prior_delivered  = scb->tx.delivered;
 		rs->prior_mstamp     = scb->tx.delivered_mstamp;