[net] tcp_cubic: switch ca->last_time to usec resolution

Message ID	20240826092707.2661435-1-edumazet@google.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDD14EAFA for <netdev@vger.kernel.org>; Mon, 26 Aug 2024 09:27:10 +0000 (UTC) Date: Mon, 26 Aug 2024 09:27:07 +0000 Precedence: bulk Mime-Version: 1.0 Message-ID: <20240826092707.2661435-1-edumazet@google.com> Subject: [PATCH net] tcp_cubic: switch ca->last_time to usec resolution From: Eric Dumazet <edumazet@google.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Neal Cardwell <ncardwell@google.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com, Eric Dumazet <edumazet@google.com>, Mingrui Zhang <mrzhang97@gmail.com>, Lisong Xu <xu@unl.edu> Content-Type: text/plain; charset="UTF-8"
Series	[net] tcp_cubic: switch ca->last_time to usec resolution \| expand [net] tcp_cubic: switch ca->last_time to usec resolution

Message ID

20240826092707.2661435-1-edumazet@google.com (mailing list archive)

State

Changes Requested

Delegated to:

Netdev Maintainers

Headers

Date: Mon, 26 Aug 2024 09:27:07 +0000
Precedence: bulk
Mime-Version: 1.0
Message-ID: <20240826092707.2661435-1-edumazet@google.com>
Subject: [PATCH net] tcp_cubic: switch ca->last_time to usec resolution
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>,
 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>, netdev@vger.kernel.org,
 eric.dumazet@gmail.com,
	Eric Dumazet <edumazet@google.com>, Mingrui Zhang <mrzhang97@gmail.com>,
 Lisong Xu <xu@unl.edu>
Content-Type: text/plain; charset="UTF-8"

Series

[net] tcp_cubic: switch ca->last_time to usec resolution | expand

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	fail	Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 16 this patch: 16
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang	success	Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 16 this patch: 16
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 50 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 1 now: 0
netdev/contest	success	net-next-2024-08-27--18-00 (tests: 714)

Context

Check

Description

netdev/series_format

success

Single patches do not need cover letters

netdev/tree_selection

success

Clearly marked for net

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

fail

Series targets non-next tree, but doesn't contain any Fixes tags

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 16 this patch: 16

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

warning

1 maintainers not CCed: dsahern@kernel.org

netdev/build_clang

success

Errors and warnings before: 16 this patch: 16

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 16 this patch: 16

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 50 lines checked

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 1 now: 0

netdev/contest

success

net-next-2024-08-27--18-00 (tests: 714)

Commit Message

Eric Dumazet Aug. 26, 2024, 9:27 a.m. UTC

bictcp_update() uses ca->last_time as a timestamp
to decide of several heuristics.

Historically this timestamp has been fed with jiffies,
which has too coarse resolution, some distros are
still using CONFIG_HZ_250=y

It is time to switch to usec resolution, now TCP stack
already caches in tp->tcp_mstamp the high resolution time.

Also remove the 'inline' qualifier, this helper is used
once and compilers are smarts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/20240817163400.2616134-1-mrzhang97@gmail.com/T/#mb6a64c9e2309eb98eaeeeb4b085c4a2270b6789d
Cc: Mingrui Zhang <mrzhang97@gmail.com>
Cc: Lisong Xu <xu@unl.edu>
---
 net/ipv4/tcp_cubic.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Neal Cardwell Aug. 26, 2024, 1:25 p.m. UTC | #1

On Mon, Aug 26, 2024 at 5:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> bictcp_update() uses ca->last_time as a timestamp
> to decide of several heuristics.
>
> Historically this timestamp has been fed with jiffies,
> which has too coarse resolution, some distros are
> still using CONFIG_HZ_250=y
>
> It is time to switch to usec resolution, now TCP stack
> already caches in tp->tcp_mstamp the high resolution time.
>
> Also remove the 'inline' qualifier, this helper is used
> once and compilers are smarts.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Link: https://lore.kernel.org/netdev/20240817163400.2616134-1-mrzhang97@gmail.com/T/#mb6a64c9e2309eb98eaeeeb4b085c4a2270b6789d
> Cc: Mingrui Zhang <mrzhang97@gmail.com>
> Cc: Lisong Xu <xu@unl.edu>
> ---
>  net/ipv4/tcp_cubic.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> index 5dbed91c6178257df8d2ccd1c8690a10bdbaf56a..3b1845103ee1866a316926a130c212e6f5e78ef0 100644
> --- a/net/ipv4/tcp_cubic.c
> +++ b/net/ipv4/tcp_cubic.c
> @@ -87,7 +87,7 @@ struct bictcp {
>         u32     cnt;            /* increase cwnd by 1 after ACKs */
>         u32     last_max_cwnd;  /* last maximum snd_cwnd */
>         u32     last_cwnd;      /* the last snd_cwnd */
> -       u32     last_time;      /* time when updated last_cwnd */
> +       u32     last_time;      /* time when updated last_cwnd (usec) */
>         u32     bic_origin_point;/* origin point of bic function */
>         u32     bic_K;          /* time to origin point
>                                    from the beginning of the current epoch */
> @@ -211,26 +211,28 @@ static u32 cubic_root(u64 a)
>  /*
>   * Compute congestion window to use.
>   */
> -static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
> +static void bictcp_update(struct sock *sk, u32 cwnd, u32 acked)
>  {
> +       const struct tcp_sock *tp = tcp_sk(sk);
> +       struct bictcp *ca = inet_csk_ca(sk);
>         u32 delta, bic_target, max_cnt;
>         u64 offs, t;
>
>         ca->ack_cnt += acked;   /* count the number of ACKed packets */
>
> -       if (ca->last_cwnd == cwnd &&
> -           (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
> +       delta = tp->tcp_mstamp - ca->last_time;
> +       if (ca->last_cwnd == cwnd && delta <= USEC_PER_SEC / 32)
>                 return;
>
> -       /* The CUBIC function can update ca->cnt at most once per jiffy.
> +       /* The CUBIC function can update ca->cnt at most once per ms.
>          * On all cwnd reduction events, ca->epoch_start is set to 0,
>          * which will force a recalculation of ca->cnt.
>          */
> -       if (ca->epoch_start && tcp_jiffies32 == ca->last_time)
> +       if (ca->epoch_start && delta < USEC_PER_MSEC)
>                 goto tcp_friendliness;

AFAICT there is a problem here. It is switching this line of code to
use microsecond resolution without also changing the core CUBIC slope
(ca->cnt) calculation to also use microseconds.  AFAICT that means we
would be re-introducing the bug that was fixed in 2015 in
d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d (see below). Basically, if
the CUBIC slope (ca->cnt) calculation uses jiffies, then we should
only run that code once per jiffy, to avoid getting the wrong answer
for the slope:

commit d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed Jan 28 20:01:39 2015 -0500

    tcp: fix timing issue in CUBIC slope calculation

    This patch fixes a bug in CUBIC that causes cwnd to increase slightly
    too slowly when multiple ACKs arrive in the same jiffy.

    If cwnd is supposed to increase at a rate of more than once per jiffy,
    then CUBIC was sometimes too slow. Because the bic_target is
    calculated for a future point in time, calculated with time in
    jiffies, the cwnd can increase over the course of the jiffy while the
    bic_target calculated as the proper CUBIC cwnd at time
    t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only
    increases on jiffy tick boundaries.

    So since the cnt is set to:
            ca->cnt = cwnd / (bic_target - cwnd);
    as cwnd increases but bic_target does not increase due to jiffy
    granularity, the cnt becomes too large, causing cwnd to increase
    too slowly.

    For example:
    - suppose at the beginning of a jiffy, cwnd=40, bic_target=44
    - so CUBIC sets:
       ca->cnt =  cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10
    - suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai()
       increases cwnd to 41
    - so CUBIC sets:
       ca->cnt =  cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13

    So now CUBIC will wait for 13 packets to be ACKed before increasing
    cwnd to 42, insted of 10 as it should.

    The fix is to avoid adjusting the slope (determined by ca->cnt)
    multiple times within a jiffy, and instead skip to compute the Reno
    cwnd, the "TCP friendliness" code path.

    Reported-by: Eyal Perry <eyalpe@mellanox.com>
    Signed-off-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index ffc045da2fd5..4b276d1ed980 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -213,6 +213,13 @@ static inline void bictcp_update(struct bictcp
*ca, u32 cwnd, u32 acked)
            (s32)(tcp_time_stamp - ca->last_time) <= HZ / 32)
                return;

+       /* The CUBIC function can update ca->cnt at most once per jiffy.
+        * On all cwnd reduction events, ca->epoch_start is set to 0,
+        * which will force a recalculation of ca->cnt.
+        */
+       if (ca->epoch_start && tcp_time_stamp == ca->last_time)
+               goto tcp_friendliness;
+
        ca->last_cwnd = cwnd;
        ca->last_time = tcp_time_stamp;

@@ -280,6 +287,7 @@ static inline void bictcp_update(struct bictcp
*ca, u32 cwnd, u32 acked)
        if (ca->last_max_cwnd == 0 && ca->cnt > 20)
                ca->cnt = 20;   /* increase cwnd 5% per RTT */

+tcp_friendliness:
        /* TCP Friendly */
        if (tcp_friendliness) {
                u32 scale = beta_scale;
---

neal

Eric Dumazet Aug. 26, 2024, 5:27 p.m. UTC | #2

On Mon, Aug 26, 2024 at 3:26 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Mon, Aug 26, 2024 at 5:27 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > bictcp_update() uses ca->last_time as a timestamp
> > to decide of several heuristics.
> >
> > Historically this timestamp has been fed with jiffies,
> > which has too coarse resolution, some distros are
> > still using CONFIG_HZ_250=y
> >
> > It is time to switch to usec resolution, now TCP stack
> > already caches in tp->tcp_mstamp the high resolution time.
> >
> > Also remove the 'inline' qualifier, this helper is used
> > once and compilers are smarts.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Link: https://lore.kernel.org/netdev/20240817163400.2616134-1-mrzhang97@gmail.com/T/#mb6a64c9e2309eb98eaeeeb4b085c4a2270b6789d
> > Cc: Mingrui Zhang <mrzhang97@gmail.com>
> > Cc: Lisong Xu <xu@unl.edu>
> > ---
> >  net/ipv4/tcp_cubic.c | 18 ++++++++++--------
> >  1 file changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> > index 5dbed91c6178257df8d2ccd1c8690a10bdbaf56a..3b1845103ee1866a316926a130c212e6f5e78ef0 100644
> > --- a/net/ipv4/tcp_cubic.c
> > +++ b/net/ipv4/tcp_cubic.c
> > @@ -87,7 +87,7 @@ struct bictcp {
> >         u32     cnt;            /* increase cwnd by 1 after ACKs */
> >         u32     last_max_cwnd;  /* last maximum snd_cwnd */
> >         u32     last_cwnd;      /* the last snd_cwnd */
> > -       u32     last_time;      /* time when updated last_cwnd */
> > +       u32     last_time;      /* time when updated last_cwnd (usec) */
> >         u32     bic_origin_point;/* origin point of bic function */
> >         u32     bic_K;          /* time to origin point
> >                                    from the beginning of the current epoch */
> > @@ -211,26 +211,28 @@ static u32 cubic_root(u64 a)
> >  /*
> >   * Compute congestion window to use.
> >   */
> > -static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
> > +static void bictcp_update(struct sock *sk, u32 cwnd, u32 acked)
> >  {
> > +       const struct tcp_sock *tp = tcp_sk(sk);
> > +       struct bictcp *ca = inet_csk_ca(sk);
> >         u32 delta, bic_target, max_cnt;
> >         u64 offs, t;
> >
> >         ca->ack_cnt += acked;   /* count the number of ACKed packets */
> >
> > -       if (ca->last_cwnd == cwnd &&
> > -           (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
> > +       delta = tp->tcp_mstamp - ca->last_time;
> > +       if (ca->last_cwnd == cwnd && delta <= USEC_PER_SEC / 32)
> >                 return;
> >
> > -       /* The CUBIC function can update ca->cnt at most once per jiffy.
> > +       /* The CUBIC function can update ca->cnt at most once per ms.
> >          * On all cwnd reduction events, ca->epoch_start is set to 0,
> >          * which will force a recalculation of ca->cnt.
> >          */
> > -       if (ca->epoch_start && tcp_jiffies32 == ca->last_time)
> > +       if (ca->epoch_start && delta < USEC_PER_MSEC)
> >                 goto tcp_friendliness;
>
> AFAICT there is a problem here. It is switching this line of code to
> use microsecond resolution without also changing the core CUBIC slope
> (ca->cnt) calculation to also use microseconds.  AFAICT that means we
> would be re-introducing the bug that was fixed in 2015 in
> d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d (see below). Basically, if
> the CUBIC slope (ca->cnt) calculation uses jiffies, then we should
> only run that code once per jiffy, to avoid getting the wrong answer
> for the slope:

Interesting.... would adding the following part deal with this
problem, or is it something else ?

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 3b1845103ee1866a316926a130c212e6f5e78ef0..bff5688ba5109fa5a0bbff7dc529525b2752dc46
100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -268,9 +268,10 @@ static void bictcp_update(struct sock *sk, u32
cwnd, u32 acked)

        t = (s32)(tcp_jiffies32 - ca->epoch_start);
        t += usecs_to_jiffies(ca->delay_min);
-       /* change the unit from HZ to bictcp_HZ */
+       t = jiffies_to_msecs(t);
+       /* change the unit from ms to bictcp_HZ */
        t <<= BICTCP_HZ;
-       do_div(t, HZ);
+       do_div(t, MSEC_PER_SEC);

        if (t < ca->bic_K)              /* t - K */
                offs = ca->bic_K - t;

Neal Cardwell Aug. 28, 2024, 8:09 p.m. UTC | #3

On Mon, Aug 26, 2024 at 1:27 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 26, 2024 at 3:26 PM Neal Cardwell <ncardwell@google.com> wrote:
> >
> > On Mon, Aug 26, 2024 at 5:27 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > bictcp_update() uses ca->last_time as a timestamp
> > > to decide of several heuristics.
> > >
> > > Historically this timestamp has been fed with jiffies,
> > > which has too coarse resolution, some distros are
> > > still using CONFIG_HZ_250=y
> > >
> > > It is time to switch to usec resolution, now TCP stack
> > > already caches in tp->tcp_mstamp the high resolution time.
> > >
> > > Also remove the 'inline' qualifier, this helper is used
> > > once and compilers are smarts.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Link: https://lore.kernel.org/netdev/20240817163400.2616134-1-mrzhang97@gmail.com/T/#mb6a64c9e2309eb98eaeeeb4b085c4a2270b6789d
> > > Cc: Mingrui Zhang <mrzhang97@gmail.com>
> > > Cc: Lisong Xu <xu@unl.edu>
> > > ---
> > >  net/ipv4/tcp_cubic.c | 18 ++++++++++--------
> > >  1 file changed, 10 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> > > index 5dbed91c6178257df8d2ccd1c8690a10bdbaf56a..3b1845103ee1866a316926a130c212e6f5e78ef0 100644
> > > --- a/net/ipv4/tcp_cubic.c
> > > +++ b/net/ipv4/tcp_cubic.c
> > > @@ -87,7 +87,7 @@ struct bictcp {
> > >         u32     cnt;            /* increase cwnd by 1 after ACKs */
> > >         u32     last_max_cwnd;  /* last maximum snd_cwnd */
> > >         u32     last_cwnd;      /* the last snd_cwnd */
> > > -       u32     last_time;      /* time when updated last_cwnd */
> > > +       u32     last_time;      /* time when updated last_cwnd (usec) */
> > >         u32     bic_origin_point;/* origin point of bic function */
> > >         u32     bic_K;          /* time to origin point
> > >                                    from the beginning of the current epoch */
> > > @@ -211,26 +211,28 @@ static u32 cubic_root(u64 a)
> > >  /*
> > >   * Compute congestion window to use.
> > >   */
> > > -static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
> > > +static void bictcp_update(struct sock *sk, u32 cwnd, u32 acked)
> > >  {
> > > +       const struct tcp_sock *tp = tcp_sk(sk);
> > > +       struct bictcp *ca = inet_csk_ca(sk);
> > >         u32 delta, bic_target, max_cnt;
> > >         u64 offs, t;
> > >
> > >         ca->ack_cnt += acked;   /* count the number of ACKed packets */
> > >
> > > -       if (ca->last_cwnd == cwnd &&
> > > -           (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
> > > +       delta = tp->tcp_mstamp - ca->last_time;
> > > +       if (ca->last_cwnd == cwnd && delta <= USEC_PER_SEC / 32)
> > >                 return;
> > >
> > > -       /* The CUBIC function can update ca->cnt at most once per jiffy.
> > > +       /* The CUBIC function can update ca->cnt at most once per ms.
> > >          * On all cwnd reduction events, ca->epoch_start is set to 0,
> > >          * which will force a recalculation of ca->cnt.
> > >          */
> > > -       if (ca->epoch_start && tcp_jiffies32 == ca->last_time)
> > > +       if (ca->epoch_start && delta < USEC_PER_MSEC)
> > >                 goto tcp_friendliness;
> >
> > AFAICT there is a problem here. It is switching this line of code to
> > use microsecond resolution without also changing the core CUBIC slope
> > (ca->cnt) calculation to also use microseconds.  AFAICT that means we
> > would be re-introducing the bug that was fixed in 2015 in
> > d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d (see below). Basically, if
> > the CUBIC slope (ca->cnt) calculation uses jiffies, then we should
> > only run that code once per jiffy, to avoid getting the wrong answer
> > for the slope:
>
> Interesting.... would adding the following part deal with this
> problem, or is it something else ?
>
> diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> index 3b1845103ee1866a316926a130c212e6f5e78ef0..bff5688ba5109fa5a0bbff7dc529525b2752dc46
> 100644
> --- a/net/ipv4/tcp_cubic.c
> +++ b/net/ipv4/tcp_cubic.c
> @@ -268,9 +268,10 @@ static void bictcp_update(struct sock *sk, u32
> cwnd, u32 acked)
>
>         t = (s32)(tcp_jiffies32 - ca->epoch_start);
>         t += usecs_to_jiffies(ca->delay_min);
> -       /* change the unit from HZ to bictcp_HZ */
> +       t = jiffies_to_msecs(t);
> +       /* change the unit from ms to bictcp_HZ */
>         t <<= BICTCP_HZ;
> -       do_div(t, HZ);
> +       do_div(t, MSEC_PER_SEC);
>
>         if (t < ca->bic_K)              /* t - K */
>                 offs = ca->bic_K - t;

I don't think that would be sufficient to take care of the issue.

The issue (addressed in d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d) is
that in the CUBIC bictcp_update() computation of bic_target the input
is tcp_jiffies32. That means that the output bic_target will only
change when the tcp_jiffies32 increments to a new jiffies value.

That means that if we were to go back to executing the bic_target  and
ca->cnt computations more than once per jiffy, the ca->cnt "slope"
value becomes increasingly incorrect over the course of each jiffy,
due to the ca->cnt computation looking like:

  ca->cnt = cwnd / (bic_target - cwnd);

...and the fact that cwnd can update for each ACK event, while
bic_target is "stuck" during the course of the jiffy due to the jiffy
granularity.

I guess one approach to trying to avoid this issue would be to change
the initial computation of the "t" variable to be in microseconds and
increase BICTCP_HZ from 10 to 20 so that the final value of t also
increases roughly once per microsecond. But then I suspect a lot of
code would have to be tweaked to avoid overflows... e.g., AFAICT with
microsecond units the core logic to cube the offs value would overflow
quite often:

  delta = (cube_rtt_scale * offs * offs * offs) >> (10+3*BICTCP_HZ)

IMHO it's safest to just leave last_time in jiffies. :-)

neal

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 5dbed91c6178257df8d2ccd1c8690a10bdbaf56a..3b1845103ee1866a316926a130c212e6f5e78ef0 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -87,7 +87,7 @@  struct bictcp {
 	u32	cnt;		/* increase cwnd by 1 after ACKs */
 	u32	last_max_cwnd;	/* last maximum snd_cwnd */
 	u32	last_cwnd;	/* the last snd_cwnd */
-	u32	last_time;	/* time when updated last_cwnd */
+	u32	last_time;	/* time when updated last_cwnd (usec) */
 	u32	bic_origin_point;/* origin point of bic function */
 	u32	bic_K;		/* time to origin point
 				   from the beginning of the current epoch */
@@ -211,26 +211,28 @@  static u32 cubic_root(u64 a)
 /*
  * Compute congestion window to use.
  */
-static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
+static void bictcp_update(struct sock *sk, u32 cwnd, u32 acked)
 {
+	const struct tcp_sock *tp = tcp_sk(sk);
+	struct bictcp *ca = inet_csk_ca(sk);
 	u32 delta, bic_target, max_cnt;
 	u64 offs, t;
 
 	ca->ack_cnt += acked;	/* count the number of ACKed packets */
 
-	if (ca->last_cwnd == cwnd &&
-	    (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
+	delta = tp->tcp_mstamp - ca->last_time;
+	if (ca->last_cwnd == cwnd && delta <= USEC_PER_SEC / 32)
 		return;
 
-	/* The CUBIC function can update ca->cnt at most once per jiffy.
+	/* The CUBIC function can update ca->cnt at most once per ms.
 	 * On all cwnd reduction events, ca->epoch_start is set to 0,
 	 * which will force a recalculation of ca->cnt.
 	 */
-	if (ca->epoch_start && tcp_jiffies32 == ca->last_time)
+	if (ca->epoch_start && delta < USEC_PER_MSEC)
 		goto tcp_friendliness;
 
 	ca->last_cwnd = cwnd;
-	ca->last_time = tcp_jiffies32;
+	ca->last_time = tp->tcp_mstamp;
 
 	if (ca->epoch_start == 0) {
 		ca->epoch_start = tcp_jiffies32;	/* record beginning */
@@ -334,7 +336,7 @@  __bpf_kfunc static void cubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 		if (!acked)
 			return;
 	}
-	bictcp_update(ca, tcp_snd_cwnd(tp), acked);
+	bictcp_update(sk, tcp_snd_cwnd(tp), acked);
 	tcp_cong_avoid_ai(tp, ca->cnt, acked);
 }

[net] tcp_cubic: switch ca->last_time to usec resolution

Checks

Commit Message

Comments

Patch