diff mbox series

[bpf-next,v7,2/8] net: Update an existing TCP congestion control algorithm.

Message ID 20230316023641.2092778-3-kuifeng@meta.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Transit between BPF TCP congestion controls. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1015 this patch: 1015
netdev/cc_maintainers warning 4 maintainers not CCed: kuba@kernel.org pabeni@redhat.com dsahern@kernel.org davem@davemloft.net
netdev/build_clang success Errors and warnings before: 121 this patch: 121
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1021 this patch: 1021
netdev/checkpatch warning WARNING: line length of 88 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc

Commit Message

Kui-Feng Lee March 16, 2023, 2:36 a.m. UTC
This feature lets you immediately transition to another congestion
control algorithm or implementation with the same name.  Once a name
is updated, new connections will apply this new algorithm.

The purpose is to update a customized algorithm implemented in BPF
struct_ops with a new version on the flight.  The following is an
example of using the userspace API implemented in later BPF patches.

   link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
   .......
   err = bpf_link__update_map(link, skel->maps.ca_update_2);

We first load and register an algorithm implemented in BPF struct_ops,
then swap it out with a new one using the same name. After that, newly
created connections will apply the updated algorithm, while older ones
retain the previous version already applied.

This patch also takes this chance to refactor the ca validation into
the new tcp_validate_congestion_control() function.

Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
---
 include/net/tcp.h   |  3 +++
 net/ipv4/tcp_cong.c | 60 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 56 insertions(+), 7 deletions(-)

Comments

Daniel Borkmann March 17, 2023, 3:23 p.m. UTC | #1
On 3/16/23 3:36 AM, Kui-Feng Lee wrote:
> This feature lets you immediately transition to another congestion
> control algorithm or implementation with the same name.  Once a name
> is updated, new connections will apply this new algorithm.
> 
> The purpose is to update a customized algorithm implemented in BPF
> struct_ops with a new version on the flight.  The following is an
> example of using the userspace API implemented in later BPF patches.
> 
>     link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
>     .......
>     err = bpf_link__update_map(link, skel->maps.ca_update_2);
> 
> We first load and register an algorithm implemented in BPF struct_ops,
> then swap it out with a new one using the same name. After that, newly
> created connections will apply the updated algorithm, while older ones
> retain the previous version already applied.
> 
> This patch also takes this chance to refactor the ca validation into
> the new tcp_validate_congestion_control() function.
> 
> Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>
> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
> ---
>   include/net/tcp.h   |  3 +++
>   net/ipv4/tcp_cong.c | 60 +++++++++++++++++++++++++++++++++++++++------
>   2 files changed, 56 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index db9f828e9d1e..2abb755e6a3a 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1117,6 +1117,9 @@ struct tcp_congestion_ops {
>   
>   int tcp_register_congestion_control(struct tcp_congestion_ops *type);
>   void tcp_unregister_congestion_control(struct tcp_congestion_ops *type);
> +int tcp_update_congestion_control(struct tcp_congestion_ops *type,
> +				  struct tcp_congestion_ops *old_type);
> +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca);
>   
>   void tcp_assign_congestion_control(struct sock *sk);
>   void tcp_init_congestion_control(struct sock *sk);
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index db8b4b488c31..c90791ae8389 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -75,14 +75,8 @@ struct tcp_congestion_ops *tcp_ca_find_key(u32 key)
>   	return NULL;
>   }
>   
> -/*
> - * Attach new congestion control algorithm to the list
> - * of available options.
> - */
> -int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
> +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca)
>   {
> -	int ret = 0;
> -
>   	/* all algorithms must implement these */
>   	if (!ca->ssthresh || !ca->undo_cwnd ||
>   	    !(ca->cong_avoid || ca->cong_control)) {
> @@ -90,6 +84,20 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
>   		return -EINVAL;
>   	}
>   
> +	return 0;
> +}
> +
> +/* Attach new congestion control algorithm to the list
> + * of available options.
> + */
> +int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
> +{
> +	int ret;
> +
> +	ret = tcp_validate_congestion_control(ca);
> +	if (ret)
> +		return ret;
> +
>   	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
>   
>   	spin_lock(&tcp_cong_list_lock);
> @@ -130,6 +138,44 @@ void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca)
>   }
>   EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control);
>   
> +/* Replace a registered old ca with a new one.
> + *
> + * The new ca must have the same name as the old one, that has been
> + * registered.
> + */
> +int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca)
> +{
> +	struct tcp_congestion_ops *existing;
> +	int ret;
> +
> +	ret = tcp_validate_congestion_control(ca);
> +	if (ret)
> +		return ret;
> +
> +	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
> +
> +	spin_lock(&tcp_cong_list_lock);
> +	existing = tcp_ca_find_key(old_ca->key);
> +	if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) {
> +		pr_notice("%s not registered or non-unique key\n",
> +			  ca->name);
> +		ret = -EINVAL;
> +	} else if (existing != old_ca) {
> +		pr_notice("invalid old congestion control algorithm to replace\n");
> +		ret = -EINVAL;
> +	} else {
> +		/* Add the new one before removing the old one to keep
> +		 * one implementation available all the time.
> +		 */
> +		list_add_tail_rcu(&ca->list, &tcp_cong_list);
> +		list_del_rcu(&existing->list);
> +		pr_debug("%s updated\n", ca->name);
> +	}
> +	spin_unlock(&tcp_cong_list_lock);
> +
> +	return ret;
> +}

Was wondering if we could have tcp_register_congestion_control and tcp_update_congestion_control
could be refactored for reuse. Maybe like below. From the function itself what is not clear whether
callers that replace an existing one should do the synchronize_rcu() themselves or if this should
be part of tcp_update_congestion_control?

int tcp_check_congestion_control(struct tcp_congestion_ops *ca)
{
	/* All algorithms must implement these. */
	if (!ca->ssthresh || !ca->undo_cwnd ||
	    !(ca->cong_avoid || ca->cong_control)) {
		pr_err("%s does not implement required ops\n", ca->name);
		return -EINVAL;
	}
	if (ca->key == TCP_CA_UNSPEC)
		ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
	if (ca->key == TCP_CA_UNSPEC) {
		pr_notice("%s results in zero key\n", ca->name);
		return -EEXIST;
	}
	return 0;
}

/* Attach new congestion control algorithm to the list of available
  * options or replace an existing one if old is non-NULL.
  */
int tcp_update_congestion_control(struct tcp_congestion_ops *new,
				  struct tcp_congestion_ops *old)
{
	struct tcp_congestion_ops *found;
	int ret;

	ret = tcp_check_congestion_control(new);
	if (ret)
		return ret;
	if (old &&
	    (old->key != new->key ||
	     strcmp(old->name, new->name))) {
		pr_notice("%s & %s have non-matching congestion control names\n",
			  old->name, new->name);
		return -EINVAL;
	}
	spin_lock(&tcp_cong_list_lock);
	found = tcp_ca_find_key(new->key);
	if (old) {
		if (found == old) {
			list_add_tail_rcu(&new->list, &tcp_cong_list);
			list_del_rcu(&old->list);
		} else {
			pr_notice("%s not registered\n", old->name);
			ret = -EINVAL;
		}
	} else {
		if (found) {
			pr_notice("%s already registered\n", new->name);
			ret = -EEXIST;
		} else {
			list_add_tail_rcu(&new->list, &tcp_cong_list);
		}
	}
	spin_unlock(&tcp_cong_list_lock);
	return ret;
}

int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
{
	return tcp_update_congestion_control(ca, NULL);
}
EXPORT_SYMBOL_GPL(tcp_register_congestion_control);
Martin KaFai Lau March 17, 2023, 5:18 p.m. UTC | #2
On 3/17/23 8:23 AM, Daniel Borkmann wrote:
>  From the function itself what is not clear whether
> callers that replace an existing one should do the synchronize_rcu() themselves 
> or if this should
> be part of tcp_update_congestion_control?

bpf_struct_ops_map_free (in patch 1) also does synchronize_rcu() for another 
reason (bpf_setsockopt), so the caller (bpf_struct_ops) is doing it. From 
looking at tcp_unregister_congestion_control(), make sense that it is more 
correct to have another synchronize_rcu() also in tcp_update_congestion_control 
in case there will be other non bpf_struct_ops caller doing update in the future.
Daniel Borkmann March 17, 2023, 5:23 p.m. UTC | #3
On 3/17/23 6:18 PM, Martin KaFai Lau wrote:
> On 3/17/23 8:23 AM, Daniel Borkmann wrote:
>>  From the function itself what is not clear whether
>> callers that replace an existing one should do the synchronize_rcu() themselves or if this should
>> be part of tcp_update_congestion_control?
> 
> bpf_struct_ops_map_free (in patch 1) also does synchronize_rcu() for another reason (bpf_setsockopt), so the caller (bpf_struct_ops) is doing it. From looking at tcp_unregister_congestion_control(), make sense that it is more correct to have another synchronize_rcu() also in tcp_update_congestion_control in case there will be other non bpf_struct_ops caller doing update in the future.

Agree, I was looking at 'bpf: Update the struct_ops of a bpf_link', and essentially as-is
it was implicit via map free. +1, tcp_update_congestion_control() would be more obvious and
better for other/non-BPF users.

Thanks,
Daniel
Kuifeng Lee March 17, 2023, 9:46 p.m. UTC | #4
On 3/17/23 10:23, Daniel Borkmann wrote:
> On 3/17/23 6:18 PM, Martin KaFai Lau wrote:
>> On 3/17/23 8:23 AM, Daniel Borkmann wrote:
>>>  From the function itself what is not clear whether
>>> callers that replace an existing one should do the synchronize_rcu() 
>>> themselves or if this should
>>> be part of tcp_update_congestion_control?
>>
>> bpf_struct_ops_map_free (in patch 1) also does synchronize_rcu() for 
>> another reason (bpf_setsockopt), so the caller (bpf_struct_ops) is 
>> doing it. From looking at tcp_unregister_congestion_control(), make 
>> sense that it is more correct to have another synchronize_rcu() also 
>> in tcp_update_congestion_control in case there will be other non 
>> bpf_struct_ops caller doing update in the future.
> 
> Agree, I was looking at 'bpf: Update the struct_ops of a bpf_link', and 
> essentially as-is
> it was implicit via map free. +1, tcp_update_congestion_control() would 
> be more obvious and
> better for other/non-BPF users.

It makes sense to me.
I will refactor functions as well.

> 
> Thanks,
> Daniel
Kuifeng Lee March 17, 2023, 11:07 p.m. UTC | #5
On 3/17/23 08:23, Daniel Borkmann wrote:
> On 3/16/23 3:36 AM, Kui-Feng Lee wrote:
>> This feature lets you immediately transition to another congestion
>> control algorithm or implementation with the same name.  Once a name
>> is updated, new connections will apply this new algorithm.
>>
>> The purpose is to update a customized algorithm implemented in BPF
>> struct_ops with a new version on the flight.  The following is an
>> example of using the userspace API implemented in later BPF patches.
>>
>>     link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
>>     .......
>>     err = bpf_link__update_map(link, skel->maps.ca_update_2);
>>
>> We first load and register an algorithm implemented in BPF struct_ops,
>> then swap it out with a new one using the same name. After that, newly
>> created connections will apply the updated algorithm, while older ones
>> retain the previous version already applied.
>>
>> This patch also takes this chance to refactor the ca validation into
>> the new tcp_validate_congestion_control() function.
>>
>> Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>> ---
>>   include/net/tcp.h   |  3 +++
>>   net/ipv4/tcp_cong.c | 60 +++++++++++++++++++++++++++++++++++++++------
>>   2 files changed, 56 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index db9f828e9d1e..2abb755e6a3a 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -1117,6 +1117,9 @@ struct tcp_congestion_ops {
>>   int tcp_register_congestion_control(struct tcp_congestion_ops *type);
>>   void tcp_unregister_congestion_control(struct tcp_congestion_ops 
>> *type);
>> +int tcp_update_congestion_control(struct tcp_congestion_ops *type,
>> +                  struct tcp_congestion_ops *old_type);
>> +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca);
>>   void tcp_assign_congestion_control(struct sock *sk);
>>   void tcp_init_congestion_control(struct sock *sk);
>> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
>> index db8b4b488c31..c90791ae8389 100644
>> --- a/net/ipv4/tcp_cong.c
>> +++ b/net/ipv4/tcp_cong.c
>> @@ -75,14 +75,8 @@ struct tcp_congestion_ops *tcp_ca_find_key(u32 key)
>>       return NULL;
>>   }
>> -/*
>> - * Attach new congestion control algorithm to the list
>> - * of available options.
>> - */
>> -int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
>> +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca)
>>   {
>> -    int ret = 0;
>> -
>>       /* all algorithms must implement these */
>>       if (!ca->ssthresh || !ca->undo_cwnd ||
>>           !(ca->cong_avoid || ca->cong_control)) {
>> @@ -90,6 +84,20 @@ int tcp_register_congestion_control(struct 
>> tcp_congestion_ops *ca)
>>           return -EINVAL;
>>       }
>> +    return 0;
>> +}
>> +
>> +/* Attach new congestion control algorithm to the list
>> + * of available options.
>> + */
>> +int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
>> +{
>> +    int ret;
>> +
>> +    ret = tcp_validate_congestion_control(ca);
>> +    if (ret)
>> +        return ret;
>> +
>>       ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
>>       spin_lock(&tcp_cong_list_lock);
>> @@ -130,6 +138,44 @@ void tcp_unregister_congestion_control(struct 
>> tcp_congestion_ops *ca)
>>   }
>>   EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control);
>> +/* Replace a registered old ca with a new one.
>> + *
>> + * The new ca must have the same name as the old one, that has been
>> + * registered.
>> + */
>> +int tcp_update_congestion_control(struct tcp_congestion_ops *ca, 
>> struct tcp_congestion_ops *old_ca)
>> +{
>> +    struct tcp_congestion_ops *existing;
>> +    int ret;
>> +
>> +    ret = tcp_validate_congestion_control(ca);
>> +    if (ret)
>> +        return ret;
>> +
>> +    ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
>> +
>> +    spin_lock(&tcp_cong_list_lock);
>> +    existing = tcp_ca_find_key(old_ca->key);
>> +    if (ca->key == TCP_CA_UNSPEC || !existing || 
>> strcmp(existing->name, ca->name)) {
>> +        pr_notice("%s not registered or non-unique key\n",
>> +              ca->name);
>> +        ret = -EINVAL;
>> +    } else if (existing != old_ca) {
>> +        pr_notice("invalid old congestion control algorithm to 
>> replace\n");
>> +        ret = -EINVAL;
>> +    } else {
>> +        /* Add the new one before removing the old one to keep
>> +         * one implementation available all the time.
>> +         */
>> +        list_add_tail_rcu(&ca->list, &tcp_cong_list);
>> +        list_del_rcu(&existing->list);
>> +        pr_debug("%s updated\n", ca->name);
>> +    }
>> +    spin_unlock(&tcp_cong_list_lock);
>> +
>> +    return ret;
>> +}
> 
> Was wondering if we could have tcp_register_congestion_control and 
> tcp_update_congestion_control
> could be refactored for reuse. Maybe like below. From the function 
> itself what is not clear whether
> callers that replace an existing one should do the synchronize_rcu() 
> themselves or if this should
> be part of tcp_update_congestion_control?
> 
> int tcp_check_congestion_control(struct tcp_congestion_ops *ca)
> {
>      /* All algorithms must implement these. */
>      if (!ca->ssthresh || !ca->undo_cwnd ||
>          !(ca->cong_avoid || ca->cong_control)) {
>          pr_err("%s does not implement required ops\n", ca->name);
>          return -EINVAL;
>      }
>      if (ca->key == TCP_CA_UNSPEC)
>          ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
>      if (ca->key == TCP_CA_UNSPEC) {
>          pr_notice("%s results in zero key\n", ca->name);
>          return -EEXIST;
>      }
>      return 0;
> }
> 
> /* Attach new congestion control algorithm to the list of available
>   * options or replace an existing one if old is non-NULL.
>   */
> int tcp_update_congestion_control(struct tcp_congestion_ops *new,
>                    struct tcp_congestion_ops *old)
> {
>      struct tcp_congestion_ops *found;
>      int ret;
> 
>      ret = tcp_check_congestion_control(new);
>      if (ret)
>          return ret;
>      if (old &&
>          (old->key != new->key ||
>           strcmp(old->name, new->name))) {
>          pr_notice("%s & %s have non-matching congestion control names\n",
>                old->name, new->name);
>          return -EINVAL;
>      }
>      spin_lock(&tcp_cong_list_lock);
>      found = tcp_ca_find_key(new->key); >      if (old) {
>          if (found == old) {
>              list_add_tail_rcu(&new->list, &tcp_cong_list);
>              list_del_rcu(&old->list);
>          } else {
>              pr_notice("%s not registered\n", old->name);
>              ret = -EINVAL;
>          }
>      } else {
>          if (found) {
>              pr_notice("%s already registered\n", new->name);
>              ret = -EEXIST;
>          } else {
>              list_add_tail_rcu(&new->list, &tcp_cong_list);
>          }
>      }
>      spin_unlock(&tcp_cong_list_lock);
>      return ret;
> }

After trying to do this refactoring, I found it just shares a few lines
of code, but make thing complicated by adding more checks.  So, I prefer
to keep it as it is.  How do you think?


> 
> int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
> {
>      return tcp_update_congestion_control(ca, NULL);
> }
> EXPORT_SYMBOL_GPL(tcp_register_congestion_control);
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index db9f828e9d1e..2abb755e6a3a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1117,6 +1117,9 @@  struct tcp_congestion_ops {
 
 int tcp_register_congestion_control(struct tcp_congestion_ops *type);
 void tcp_unregister_congestion_control(struct tcp_congestion_ops *type);
+int tcp_update_congestion_control(struct tcp_congestion_ops *type,
+				  struct tcp_congestion_ops *old_type);
+int tcp_validate_congestion_control(struct tcp_congestion_ops *ca);
 
 void tcp_assign_congestion_control(struct sock *sk);
 void tcp_init_congestion_control(struct sock *sk);
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index db8b4b488c31..c90791ae8389 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -75,14 +75,8 @@  struct tcp_congestion_ops *tcp_ca_find_key(u32 key)
 	return NULL;
 }
 
-/*
- * Attach new congestion control algorithm to the list
- * of available options.
- */
-int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
+int tcp_validate_congestion_control(struct tcp_congestion_ops *ca)
 {
-	int ret = 0;
-
 	/* all algorithms must implement these */
 	if (!ca->ssthresh || !ca->undo_cwnd ||
 	    !(ca->cong_avoid || ca->cong_control)) {
@@ -90,6 +84,20 @@  int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
 		return -EINVAL;
 	}
 
+	return 0;
+}
+
+/* Attach new congestion control algorithm to the list
+ * of available options.
+ */
+int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
+{
+	int ret;
+
+	ret = tcp_validate_congestion_control(ca);
+	if (ret)
+		return ret;
+
 	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
 
 	spin_lock(&tcp_cong_list_lock);
@@ -130,6 +138,44 @@  void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca)
 }
 EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control);
 
+/* Replace a registered old ca with a new one.
+ *
+ * The new ca must have the same name as the old one, that has been
+ * registered.
+ */
+int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca)
+{
+	struct tcp_congestion_ops *existing;
+	int ret;
+
+	ret = tcp_validate_congestion_control(ca);
+	if (ret)
+		return ret;
+
+	ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name));
+
+	spin_lock(&tcp_cong_list_lock);
+	existing = tcp_ca_find_key(old_ca->key);
+	if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) {
+		pr_notice("%s not registered or non-unique key\n",
+			  ca->name);
+		ret = -EINVAL;
+	} else if (existing != old_ca) {
+		pr_notice("invalid old congestion control algorithm to replace\n");
+		ret = -EINVAL;
+	} else {
+		/* Add the new one before removing the old one to keep
+		 * one implementation available all the time.
+		 */
+		list_add_tail_rcu(&ca->list, &tcp_cong_list);
+		list_del_rcu(&existing->list);
+		pr_debug("%s updated\n", ca->name);
+	}
+	spin_unlock(&tcp_cong_list_lock);
+
+	return ret;
+}
+
 u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca)
 {
 	const struct tcp_congestion_ops *ca;