diff mbox series

[bpf-next,v6,2/8] net: Update an existing TCP congestion control algorithm.

Message ID 20230310043812.3087672-3-kuifeng@meta.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Transit between BPF TCP congestion controls. | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-17
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1468 this patch: 1470
netdev/cc_maintainers warning 11 maintainers not CCed: pabeni@redhat.com dsahern@kernel.org haoluo@google.com yhs@fb.com kuba@kernel.org edumazet@google.com daniel@iogearbox.net john.fastabend@gmail.com kpsingh@kernel.org jolsa@kernel.org davem@davemloft.net
netdev/build_clang fail Errors and warnings before: 168 this patch: 170
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1462 this patch: 1464
netdev/checkpatch warning WARNING: line length of 88 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on s390x with gcc

Commit Message

Kui-Feng Lee March 10, 2023, 4:38 a.m. UTC
This feature lets you immediately transition to another congestion
control algorithm or implementation with the same name.  Once a name
is updated, new connections will apply this new algorithm.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
---
 include/linux/bpf.h            |  1 +
 include/net/tcp.h              |  2 ++
 net/bpf/bpf_dummy_struct_ops.c |  6 ++++
 net/ipv4/bpf_tcp_ca.c          |  6 ++++
 net/ipv4/tcp_cong.c            | 60 ++++++++++++++++++++++++++++++----
 5 files changed, 68 insertions(+), 7 deletions(-)

Comments

Stephen Hemminger March 10, 2023, 4:47 p.m. UTC | #1
On Thu, 9 Mar 2023 20:38:07 -0800
Kui-Feng Lee <kuifeng@meta.com> wrote:

> This feature lets you immediately transition to another congestion
> control algorithm or implementation with the same name.  Once a name
> is updated, new connections will apply this new algorithm.
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>

What is the use case and userspace API for this?
The congestion control algorithm normally doesn't allow this because
algorithm specific variables (current state of connection) may not
work with another algorithm.

Seems like you are opening Pandora's box here.
Kuifeng Lee March 13, 2023, 3:46 p.m. UTC | #2
On 3/10/23 08:47, Stephen Hemminger wrote:
> On Thu, 9 Mar 2023 20:38:07 -0800
> Kui-Feng Lee <kuifeng@meta.com> wrote:
> 
>> This feature lets you immediately transition to another congestion
>> control algorithm or implementation with the same name.  Once a name
>> is updated, new connections will apply this new algorithm.
>>
>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
> 
> What is the use case and userspace API for this?
> The congestion control algorithm normally doesn't allow this because
> algorithm specific variables (current state of connection) may not
> work with another algorithm.

Only new connections will apply the new algorithm, while
existing connections keep using the algorithm applied. It shouldn't
have the per-connection state/variable issue you mentioned.

It will be used to upgrade an existing algorithm to a new version.
The userspace API is used in the 8th patch of this patchset.
One of examples in the testcase is

   link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
   .......
   err = bpf_link__update_map(link, skel->maps.ca_update_2);

Calling bpf_link__update_map(...) will register ca_pupdate_2 and
unregister ca_update_1 with the same name
in one call.  However, the existing connections that has applied
ca_update_1 keep using the algorithm except someone call
setsockopt(TCP_CONGESTION, ...) on them.



> 
> Seems like you are opening Pandora's box here.
Kuifeng Lee March 13, 2023, 4:43 p.m. UTC | #3
On 3/13/23 08:46, Kui-Feng Lee wrote:
> 
> 
> On 3/10/23 08:47, Stephen Hemminger wrote:
>> On Thu, 9 Mar 2023 20:38:07 -0800
>> Kui-Feng Lee <kuifeng@meta.com> wrote:
>>
>>> This feature lets you immediately transition to another congestion
>>> control algorithm or implementation with the same name.  Once a name
>>> is updated, new connections will apply this new algorithm.
>>>
>>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>>
>> What is the use case and userspace API for this?
>> The congestion control algorithm normally doesn't allow this because
>> algorithm specific variables (current state of connection) may not
>> work with another algorithm.
> 
> Only new connections will apply the new algorithm, while
> existing connections keep using the algorithm applied. It shouldn't
> have the per-connection state/variable issue you mentioned.
> 
> It will be used to upgrade an existing algorithm to a new version.
> The userspace API is used in the 8th patch of this patchset.
> One of examples in the testcase is
> 
>    link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
>    .......
>    err = bpf_link__update_map(link, skel->maps.ca_update_2);
> 
> Calling bpf_link__update_map(...) will register ca_pupdate_2 and
> unregister ca_update_1 with the same name
> in one call.  However, the existing connections that has applied
> ca_update_1 keep using the algorithm except someone call
> setsockopt(TCP_CONGESTION, ...) on them.

FYI!
The thread head of the patchset is
  https://lore.kernel.org/all/20230310043812.3087672-1-kuifeng@meta.com/


> 
> 
> 
>>
>> Seems like you are opening Pandora's box here.
>
Martin KaFai Lau March 14, 2023, 12:28 a.m. UTC | #4
On 3/9/23 8:38 PM, Kui-Feng Lee wrote:
> This feature lets you immediately transition to another congestion
> control algorithm or implementation with the same name.  Once a name
> is updated, new connections will apply this new algorithm.

The commit message needs to explain why the change is needed and some major 
details on how the patch is doing it. In this case, why a later bpf patch needs 
it and what major changes are made to tcp_cong.c.

For example,

A later bpf patch will allow attaching a bpf_struct_ops (TCP Congestion Control 
implemented in bpf) to bpf_link. The later bpf patch will also use the existing 
bpf_link API to replace a bpf_struct_ops (ie. to replace an old tcp-cc with a 
new tcp-cc under the same name). This requires a helper function to replace a 
tcp-cc under a tcp_cong_list_lock. Thus, this patch adds a 
tcp_update_congestion_control() to replace the "old_ca" with a new "ca".

This patch also takes this chance to refactor the ca validation into the new 
tcp_validate_congestion_control() function.

> 
> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
> ---
>   include/linux/bpf.h            |  1 +
>   include/net/tcp.h              |  2 ++
>   net/bpf/bpf_dummy_struct_ops.c |  6 ++++
>   net/ipv4/bpf_tcp_ca.c          |  6 ++++
>   net/ipv4/tcp_cong.c            | 60 ++++++++++++++++++++++++++++++----
>   5 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 00ca92ea6f2e..0f84925d66db 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1511,6 +1511,7 @@ struct bpf_struct_ops {
>   			   void *kdata, const void *udata);
>   	int (*reg)(void *kdata);
>   	void (*unreg)(void *kdata);
> +	int (*update)(void *kdata, void *old_kdata);
>   	const struct btf_type *type;
>   	const struct btf_type *value_type;
>   	const char *name;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index db9f828e9d1e..239cc0e2639c 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1117,6 +1117,8 @@ struct tcp_congestion_ops {
>   
>   int tcp_register_congestion_control(struct tcp_congestion_ops *type);
>   void tcp_unregister_congestion_control(struct tcp_congestion_ops *type);
> +int tcp_update_congestion_control(struct tcp_congestion_ops *type,
> +				  struct tcp_congestion_ops *old_type);
>   
>   void tcp_assign_congestion_control(struct sock *sk);
>   void tcp_init_congestion_control(struct sock *sk);
> diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
> index ff4f89a2b02a..158f14e240d0 100644
> --- a/net/bpf/bpf_dummy_struct_ops.c
> +++ b/net/bpf/bpf_dummy_struct_ops.c
> @@ -222,12 +222,18 @@ static void bpf_dummy_unreg(void *kdata)
>   {
>   }
>   
> +static int bpf_dummy_update(void *kdata, void *old_kdata)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>   struct bpf_struct_ops bpf_bpf_dummy_ops = {
>   	.verifier_ops = &bpf_dummy_verifier_ops,
>   	.init = bpf_dummy_init,
>   	.check_member = bpf_dummy_ops_check_member,
>   	.init_member = bpf_dummy_init_member,
>   	.reg = bpf_dummy_reg,
> +	.update = bpf_dummy_update,
>   	.unreg = bpf_dummy_unreg,
>   	.name = "bpf_dummy_ops",
>   };
> diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> index 13fc0c185cd9..66ce5fadfe42 100644
> --- a/net/ipv4/bpf_tcp_ca.c
> +++ b/net/ipv4/bpf_tcp_ca.c
> @@ -266,10 +266,16 @@ static void bpf_tcp_ca_unreg(void *kdata)
>   	tcp_unregister_congestion_control(kdata);
>   }
>   
> +static int bpf_tcp_ca_update(void *kdata, void *old_kdata)
> +{
> +	return tcp_update_congestion_control(kdata, old_kdata);
> +}
> +
>   struct bpf_struct_ops bpf_tcp_congestion_ops = {
>   	.verifier_ops = &bpf_tcp_ca_verifier_ops,
>   	.reg = bpf_tcp_ca_reg,
>   	.unreg = bpf_tcp_ca_unreg,
> +	.update = bpf_tcp_ca_update,

In v5, a comment was given to move the ".update" related changes to patch 5 such 
that patch 2 will only have netdev change in tcp_cong.c for review purpose.

Please ensure the earlier review comment is addressed in the next revision or 
reply if the earlier review comment does not make sense. This will save time for 
the reviewer not to have to repeat the same comment again.

>   	.check_member = bpf_tcp_ca_check_member,
>   	.init_member = bpf_tcp_ca_init_member,
>   	.init = bpf_tcp_ca_init,
Kuifeng Lee March 14, 2023, 4:31 a.m. UTC | #5
On 3/13/23 17:28, Martin KaFai Lau wrote:
> On 3/9/23 8:38 PM, Kui-Feng Lee wrote:
>> This feature lets you immediately transition to another congestion
>> control algorithm or implementation with the same name.  Once a name
>> is updated, new connections will apply this new algorithm.
> 
> The commit message needs to explain why the change is needed and some 
> major details on how the patch is doing it. In this case, why a later 
> bpf patch needs it and what major changes are made to tcp_cong.c.
> 
> For example,
> 
> A later bpf patch will allow attaching a bpf_struct_ops (TCP Congestion 
> Control implemented in bpf) to bpf_link. The later bpf patch will also 
> use the existing bpf_link API to replace a bpf_struct_ops (ie. to 
> replace an old tcp-cc with a new tcp-cc under the same name). This 
> requires a helper function to replace a tcp-cc under a 
> tcp_cong_list_lock. Thus, this patch adds a 
> tcp_update_congestion_control() to replace the "old_ca" with a new "ca".
> 
> This patch also takes this chance to refactor the ca validation into the 
> new tcp_validate_congestion_control() function.


Sure!

> 
>>
>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>> ---
>>   include/linux/bpf.h            |  1 +
>>   include/net/tcp.h              |  2 ++
>>   net/bpf/bpf_dummy_struct_ops.c |  6 ++++
>>   net/ipv4/bpf_tcp_ca.c          |  6 ++++
>>   net/ipv4/tcp_cong.c            | 60 ++++++++++++++++++++++++++++++----
>>   5 files changed, 68 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 00ca92ea6f2e..0f84925d66db 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1511,6 +1511,7 @@ struct bpf_struct_ops {
>>                  void *kdata, const void *udata);
>>       int (*reg)(void *kdata);
>>       void (*unreg)(void *kdata);
>> +    int (*update)(void *kdata, void *old_kdata);
>>       const struct btf_type *type;
>>       const struct btf_type *value_type;
>>       const char *name;
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index db9f828e9d1e..239cc0e2639c 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -1117,6 +1117,8 @@ struct tcp_congestion_ops {
>>   int tcp_register_congestion_control(struct tcp_congestion_ops *type);
>>   void tcp_unregister_congestion_control(struct tcp_congestion_ops 
>> *type);
>> +int tcp_update_congestion_control(struct tcp_congestion_ops *type,
>> +                  struct tcp_congestion_ops *old_type);
>>   void tcp_assign_congestion_control(struct sock *sk);
>>   void tcp_init_congestion_control(struct sock *sk);
>> diff --git a/net/bpf/bpf_dummy_struct_ops.c 
>> b/net/bpf/bpf_dummy_struct_ops.c
>> index ff4f89a2b02a..158f14e240d0 100644
>> --- a/net/bpf/bpf_dummy_struct_ops.c
>> +++ b/net/bpf/bpf_dummy_struct_ops.c
>> @@ -222,12 +222,18 @@ static void bpf_dummy_unreg(void *kdata)
>>   {
>>   }
>> +static int bpf_dummy_update(void *kdata, void *old_kdata)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>>   struct bpf_struct_ops bpf_bpf_dummy_ops = {
>>       .verifier_ops = &bpf_dummy_verifier_ops,
>>       .init = bpf_dummy_init,
>>       .check_member = bpf_dummy_ops_check_member,
>>       .init_member = bpf_dummy_init_member,
>>       .reg = bpf_dummy_reg,
>> +    .update = bpf_dummy_update,
>>       .unreg = bpf_dummy_unreg,
>>       .name = "bpf_dummy_ops",
>>   };
>> diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
>> index 13fc0c185cd9..66ce5fadfe42 100644
>> --- a/net/ipv4/bpf_tcp_ca.c
>> +++ b/net/ipv4/bpf_tcp_ca.c
>> @@ -266,10 +266,16 @@ static void bpf_tcp_ca_unreg(void *kdata)
>>       tcp_unregister_congestion_control(kdata);
>>   }
>> +static int bpf_tcp_ca_update(void *kdata, void *old_kdata)
>> +{
>> +    return tcp_update_congestion_control(kdata, old_kdata);
>> +}
>> +
>>   struct bpf_struct_ops bpf_tcp_congestion_ops = {
>>       .verifier_ops = &bpf_tcp_ca_verifier_ops,
>>       .reg = bpf_tcp_ca_reg,
>>       .unreg = bpf_tcp_ca_unreg,
>> +    .update = bpf_tcp_ca_update,
> 
> In v5, a comment was given to move the ".update" related changes to 
> patch 5 such that patch 2 will only have netdev change in tcp_cong.c for 
> review purpose.
> 
> Please ensure the earlier review comment is addressed in the next 
> revision or reply if the earlier review comment does not make sense. 
> This will save time for the reviewer not to have to repeat the same 
> comment again.

Sorry about this.  I only addressed .validate and missed .update.
Will fix this.

> 
>>       .check_member = bpf_tcp_ca_check_member,
>>       .init_member = bpf_tcp_ca_init_member,
>>       .init = bpf_tcp_ca_init,
> 
>
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 00ca92ea6f2e..0f84925d66db 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1511,6 +1511,7 @@  struct bpf_struct_ops {
 			   void *kdata, const void *udata);
 	int (*reg)(void *kdata);
 	void (*unreg)(void *kdata);
+	int (*update)(void *kdata, void *old_kdata);
 	const struct btf_type *type;
 	const struct btf_type *value_type;
 	const char *name;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index db9f828e9d1e..239cc0e2639c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1117,6 +1117,8 @@  struct tcp_congestion_ops {
 
 int tcp_register_congestion_control(struct tcp_congestion_ops *type);
 void tcp_unregister_congestion_control(struct tcp_congestion_ops *type);
+int tcp_update_congestion_control(struct tcp_congestion_ops *type,
+				  struct tcp_congestion_ops *old_type);
 
 void tcp_assign_congestion_control(struct sock *sk);
 void tcp_init_congestion_control(struct sock *sk);
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index ff4f89a2b02a..158f14e240d0 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -222,12 +222,18 @@  static void bpf_dummy_unreg(void *kdata)
 {
 }
 
+static int bpf_dummy_update(void *kdata, void *old_kdata)
+{
+	return -EOPNOTSUPP;
+}
+
 struct bpf_struct_ops bpf_bpf_dummy_ops = {
 	.verifier_ops = &bpf_dummy_verifier_ops,
 	.init = bpf_dummy_init,
 	.check_member = bpf_dummy_ops_check_member,
 	.init_member = bpf_dummy_init_member,
 	.reg = bpf_dummy_reg,
+	.update = bpf_dummy_update,
 	.unreg = bpf_dummy_unreg,
 	.name = "bpf_dummy_ops",
 };
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 13fc0c185cd9..66ce5fadfe42 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -266,10 +266,16 @@  static void bpf_tcp_ca_unreg(void *kdata)
 	tcp_unregister_congestion_control(kdata);
 }
 
+static int bpf_tcp_ca_update(void *kdata, void *old_kdata)
+{
+	return tcp_update_congestion_control(kdata, old_kdata);
+}
+
 struct bpf_struct_ops bpf_tcp_congestion_ops = {
 	.verifier_ops = &bpf_tcp_ca_verifier_ops,
 	.reg = bpf_tcp_ca_reg,
 	.unreg = bpf_tcp_ca_unreg,
+	.update = bpf_tcp_ca_update,
 	.check_member = bpf_tcp_ca_check_member,
 	.init_member = bpf_tcp_ca_init_member,
 	.init = bpf_tcp_ca_init,
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index db8b4b488c31..c90791ae8389 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -75,14 +75,8 @@  struct tcp_congestion_ops *tcp_ca_find_key(u32 key)
 	return NULL;
 }
 
-/*
- * Attach new congestion control algorithm to the list
- * of available options.
- */
-int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
+int tcp_validate_congestion_control(struct tcp_congestion_ops *ca)
 {
-	int ret = 0;
-
 	/* all algorithms must implement these */
 	if (!ca->ssthresh || !ca->undo_cwnd ||
 	    !(ca->cong_avoid || ca->cong_control)) {
@@ -90,6 +84,20 @@  int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
 		return -EINVAL;
 	}
 
+	return 0;
+}
+
+/* Attach new congestion control algorithm to the list
+ * of available options.
+ */
+int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
+{
+	int ret;
+
+	ret = tcp_validate_congestion_control(ca);
+	if (ret)
+		return ret;
+
 	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
 
 	spin_lock(&tcp_cong_list_lock);
@@ -130,6 +138,44 @@  void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca)
 }
 EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control);
 
+/* Replace a registered old ca with a new one.
+ *
+ * The new ca must have the same name as the old one, that has been
+ * registered.
+ */
+int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca)
+{
+	struct tcp_congestion_ops *existing;
+	int ret;
+
+	ret = tcp_validate_congestion_control(ca);
+	if (ret)
+		return ret;
+
+	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
+
+	spin_lock(&tcp_cong_list_lock);
+	existing = tcp_ca_find_key(old_ca->key);
+	if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) {
+		pr_notice("%s not registered or non-unique key\n",
+			  ca->name);
+		ret = -EINVAL;
+	} else if (existing != old_ca) {
+		pr_notice("invalid old congestion control algorithm to replace\n");
+		ret = -EINVAL;
+	} else {
+		/* Add the new one before removing the old one to keep
+		 * one implementation available all the time.
+		 */
+		list_add_tail_rcu(&ca->list, &tcp_cong_list);
+		list_del_rcu(&existing->list);
+		pr_debug("%s updated\n", ca->name);
+	}
+	spin_unlock(&tcp_cong_list_lock);
+
+	return ret;
+}
+
 u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca)
 {
 	const struct tcp_congestion_ops *ca;