[net-next] net: avoid irqsave in skb_defer_free_flush

Message ID	167395854720.539380.12918805302179692095.stgit@firesoul (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Subject: [PATCH net-next] net: avoid irqsave in skb_defer_free_flush From: Jesper Dangaard Brouer <brouer@redhat.com> To: netdev@vger.kernel.org Cc: Jesper Dangaard Brouer <brouer@redhat.com>, Jakub Kicinski <kuba@kernel.org>, "David S. Miller" <davem@davemloft.net>, edumazet@google.com, pabeni@redhat.com Date: Tue, 17 Jan 2023 13:29:07 +0100 Message-ID: <167395854720.539380.12918805302179692095.stgit@firesoul> User-Agent: StGit/1.4 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	[net-next] net: avoid irqsave in skb_defer_free_flush \| expand [net-next] net: avoid irqsave in skb_defer_free_flush

Message ID

167395854720.539380.12918805302179692095.stgit@firesoul (mailing list archive)

State

Superseded

Delegated to:

Netdev Maintainers

Headers

Subject: [PATCH net-next] net: avoid irqsave in skb_defer_free_flush
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: netdev@vger.kernel.org
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
        Jakub Kicinski <kuba@kernel.org>,
        "David S. Miller" <davem@davemloft.net>, edumazet@google.com,
        pabeni@redhat.com
Date: Tue, 17 Jan 2023 13:29:07 +0100
Message-ID: <167395854720.539380.12918805302179692095.stgit@firesoul>
User-Agent: StGit/1.4
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Precedence: bulk

Series

[net-next] net: avoid irqsave in skb_defer_free_flush | expand

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers	warning	1 maintainers not CCed: petrm@nvidia.com
netdev/build_clang	success	Errors and warnings before: 1 this patch: 1
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/tree_selection

success

Clearly marked for net-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/subject_prefix

success

Link

netdev/cover_letter

success

Single patches do not need cover letters

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 8 this patch: 8

netdev/cc_maintainers

warning

1 maintainers not CCed: petrm@nvidia.com

netdev/build_clang

success

Errors and warnings before: 1 this patch: 1

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 8 this patch: 8

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 19 lines checked

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

Jesper Dangaard Brouer Jan. 17, 2023, 12:29 p.m. UTC

The spin_lock irqsave/restore API variant in skb_defer_free_flush can
be replaced with the faster spin_lock irq variant, which doesn't need
to read and restore the CPU flags.

Using the unconditional irq "disable/enable" API variant is safe,
because the skb_defer_free_flush() function is only called during
NAPI-RX processing in net_rx_action(), where it is known the IRQs
are enabled.

Expected gain is 14 cycles from avoiding reading and restoring CPU
flags in a spin_lock_irqsave/restore operation, measured via a
microbencmark kernel module[1] on CPU E5-1650 v4 @ 3.60GHz.

Microbenchmark overhead of spin_lock+unlock:
 - spin_lock_unlock_irq     cost: 34 cycles(tsc)  9.486 ns
 - spin_lock_unlock_irqsave cost: 48 cycles(tsc) 13.567 ns

We don't expect to see a measurable packet performance gain, as
skb_defer_free_flush() is called infrequently once per NIC device NAPI
bulk cycle and conditionally only if SKBs have been deferred by other
CPUs via skb_attempt_defer_free().

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Jacob Keller Jan. 17, 2023, 7:29 p.m. UTC | #1

On 1/17/2023 4:29 AM, Jesper Dangaard Brouer wrote:
> The spin_lock irqsave/restore API variant in skb_defer_free_flush can
> be replaced with the faster spin_lock irq variant, which doesn't need
> to read and restore the CPU flags.
> 
> Using the unconditional irq "disable/enable" API variant is safe,
> because the skb_defer_free_flush() function is only called during
> NAPI-RX processing in net_rx_action(), where it is known the IRQs
> are enabled.
> 

Did you mean disabled here? If IRQs are enabled that would mean the
interrupt could be triggered and we would need to irqsave, no?

> Expected gain is 14 cycles from avoiding reading and restoring CPU
> flags in a spin_lock_irqsave/restore operation, measured via a
> microbencmark kernel module[1] on CPU E5-1650 v4 @ 3.60GHz.
> 
> Microbenchmark overhead of spin_lock+unlock:
>  - spin_lock_unlock_irq     cost: 34 cycles(tsc)  9.486 ns
>  - spin_lock_unlock_irqsave cost: 48 cycles(tsc) 13.567 ns
> 

Fairly minor change in perf, and..

> We don't expect to see a measurable packet performance gain, as
> skb_defer_free_flush() is called infrequently once per NIC device NAPI
> bulk cycle and conditionally only if SKBs have been deferred by other
> CPUs via skb_attempt_defer_free().
> 

Not really measurable as its not called enough, but..

> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  net/core/dev.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cf78f35bc0b9..9c60190fe352 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6616,17 +6616,16 @@ static int napi_threaded_poll(void *data)
>  static void skb_defer_free_flush(struct softnet_data *sd)
>  {
>  	struct sk_buff *skb, *next;
> -	unsigned long flags;
>  
>  	/* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
>  	if (!READ_ONCE(sd->defer_list))
>  		return;
>  
> -	spin_lock_irqsave(&sd->defer_lock, flags);
> +	spin_lock_irq(&sd->defer_lock);
>  	skb = sd->defer_list;
>  	sd->defer_list = NULL;
>  	sd->defer_count = 0;
> -	spin_unlock_irqrestore(&sd->defer_lock, flags);
> +	spin_unlock_irq(&sd->defer_lock);
>  

It's also less code and makes it more clear what dependency this section
has.

Seems ok to me, with the minor nit I think in the commit message:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>  	while (skb != NULL) {
>  		next = skb->next;
> 
>

Jesper Dangaard Brouer Jan. 18, 2023, 7:19 p.m. UTC | #2

On 17/01/2023 20.29, Jacob Keller wrote:
> 
> On 1/17/2023 4:29 AM, Jesper Dangaard Brouer wrote:
>> The spin_lock irqsave/restore API variant in skb_defer_free_flush can
>> be replaced with the faster spin_lock irq variant, which doesn't need
>> to read and restore the CPU flags.
>>
>> Using the unconditional irq "disable/enable" API variant is safe,
>> because the skb_defer_free_flush() function is only called during
>> NAPI-RX processing in net_rx_action(), where it is known the IRQs
>> are enabled.
>>
> 
> Did you mean disabled here? If IRQs are enabled that would mean the
> interrupt could be triggered and we would need to irqsave, no?

I do mean 'enabled' in the text here.

As you can see in net_rx_action() we are allowed to perform code like:

	local_irq_disable();
	list_splice_init(&sd->poll_list, &list);
	local_irq_enable();

Disabling local IRQ without saving 'flags' and unconditionally enabling
local IRQs again.  Thus, in skb_defer_free_flush() we can do the same,
without saving 'flags'.  Hope it makes it more clear.


>> Expected gain is 14 cycles from avoiding reading and restoring CPU
>> flags in a spin_lock_irqsave/restore operation, measured via a
>> microbencmark kernel module[1] on CPU E5-1650 v4 @ 3.60GHz.
>>
>> Microbenchmark overhead of spin_lock+unlock:
>>   - spin_lock_unlock_irq     cost: 34 cycles(tsc)  9.486 ns
>>   - spin_lock_unlock_irqsave cost: 48 cycles(tsc) 13.567 ns
>>
> 
> Fairly minor change in perf, and..
> 
>> We don't expect to see a measurable packet performance gain, as
>> skb_defer_free_flush() is called infrequently once per NIC device NAPI
>> bulk cycle and conditionally only if SKBs have been deferred by other
>> CPUs via skb_attempt_defer_free().
>>
> 
> Not really measurable as its not called enough, but..
> 
>> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>   net/core/dev.c |    5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cf78f35bc0b9..9c60190fe352 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6616,17 +6616,16 @@ static int napi_threaded_poll(void *data)
>>   static void skb_defer_free_flush(struct softnet_data *sd)
>>   {
>>   	struct sk_buff *skb, *next;
>> -	unsigned long flags;
>>   
>>   	/* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
>>   	if (!READ_ONCE(sd->defer_list))
>>   		return;
>>   
>> -	spin_lock_irqsave(&sd->defer_lock, flags);
>> +	spin_lock_irq(&sd->defer_lock);
>>   	skb = sd->defer_list;
>>   	sd->defer_list = NULL;
>>   	sd->defer_count = 0;
>> -	spin_unlock_irqrestore(&sd->defer_lock, flags);
>> +	spin_unlock_irq(&sd->defer_lock);
>>   
> 
> It's also less code and makes it more clear what dependency this section
> has.
> 
> Seems ok to me, with the minor nit I think in the commit message:
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks for the review.
--Jesper

Jacob Keller Jan. 18, 2023, 9:01 p.m. UTC | #3

On 1/18/2023 11:19 AM, Jesper Dangaard Brouer wrote:
> 
> On 17/01/2023 20.29, Jacob Keller wrote:
>>
>> On 1/17/2023 4:29 AM, Jesper Dangaard Brouer wrote:
>>> The spin_lock irqsave/restore API variant in skb_defer_free_flush can
>>> be replaced with the faster spin_lock irq variant, which doesn't need
>>> to read and restore the CPU flags.
>>>
>>> Using the unconditional irq "disable/enable" API variant is safe,
>>> because the skb_defer_free_flush() function is only called during
>>> NAPI-RX processing in net_rx_action(), where it is known the IRQs
>>> are enabled.
>>>
>>
>> Did you mean disabled here? If IRQs are enabled that would mean the
>> interrupt could be triggered and we would need to irqsave, no?
> 
> I do mean 'enabled' in the text here.
> 
> As you can see in net_rx_action() we are allowed to perform code like:
> 
> 	local_irq_disable();
> 	list_splice_init(&sd->poll_list, &list);
> 	local_irq_enable();
> 
> Disabling local IRQ without saving 'flags' and unconditionally enabling
> local IRQs again.  Thus, in skb_defer_free_flush() we can do the same,
> without saving 'flags'.  Hope it makes it more clear.
> 

Ahh, that makes sense.

In that case, no further nits and:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

diff --git a/net/core/dev.c b/net/core/dev.c
index cf78f35bc0b9..9c60190fe352 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6616,17 +6616,16 @@  static int napi_threaded_poll(void *data)
 static void skb_defer_free_flush(struct softnet_data *sd)
 {
 	struct sk_buff *skb, *next;
-	unsigned long flags;
 
 	/* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
 	if (!READ_ONCE(sd->defer_list))
 		return;
 
-	spin_lock_irqsave(&sd->defer_lock, flags);
+	spin_lock_irq(&sd->defer_lock);
 	skb = sd->defer_list;
 	sd->defer_list = NULL;
 	sd->defer_count = 0;
-	spin_unlock_irqrestore(&sd->defer_lock, flags);
+	spin_unlock_irq(&sd->defer_lock);
 
 	while (skb != NULL) {
 		next = skb->next;

[net-next] net: avoid irqsave in skb_defer_free_flush

Checks

Commit Message

Comments

Patch