diff mbox series

[net-next,1/4] net: fix possible race in skb_attempt_defer_free()

Message ID 20220516042456.3014395-2-eric.dumazet@gmail.com (mailing list archive)
State Accepted
Commit 97e719a82b43c6c2bb5eebdb3c5d479a332ac2ac
Delegated to: Netdev Maintainers
Headers show
Series net: polish skb defer freeing | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4693 this patch: 4693
netdev/cc_maintainers warning 1 maintainers not CCed: petrm@nvidia.com
netdev/build_clang success Errors and warnings before: 1059 this patch: 1059
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4846 this patch: 4846
netdev/checkpatch warning WARNING: memory barrier without comment
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet May 16, 2022, 4:24 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

A cpu can observe sd->defer_count reaching 128,
and call smp_call_function_single_async()

Problem is that the remote CPU can clear sd->defer_count
before the IPI is run/acknowledged.

Other cpus can queue more packets and also decide
to call smp_call_function_single_async() while the pending
IPI was not yet delivered.

This is a common issue with smp_call_function_single_async().
Callers must ensure correct synchronization and serialization.

I triggered this issue while experimenting smaller threshold.
Performing the call to smp_call_function_single_async()
under sd->defer_lock protection did not solve the problem.

Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
to insert locked csd") replaced an informative WARN_ON_ONCE()
with a return of -EBUSY, which is often ignored.
Test of CSD_FLAG_LOCK presence is racy anyway.

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 1 +
 net/core/dev.c            | 7 +++++--
 net/core/skbuff.c         | 5 ++---
 3 files changed, 8 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski May 16, 2022, 6:15 p.m. UTC | #1
On Sun, 15 May 2022 21:24:53 -0700 Eric Dumazet wrote:
> A cpu can observe sd->defer_count reaching 128,
> and call smp_call_function_single_async()
> 
> Problem is that the remote CPU can clear sd->defer_count
> before the IPI is run/acknowledged.
> 
> Other cpus can queue more packets and also decide
> to call smp_call_function_single_async() while the pending
> IPI was not yet delivered.
> 
> This is a common issue with smp_call_function_single_async().
> Callers must ensure correct synchronization and serialization.
> 
> I triggered this issue while experimenting smaller threshold.
> Performing the call to smp_call_function_single_async()
> under sd->defer_lock protection did not solve the problem.
> 
> Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
> to insert locked csd") replaced an informative WARN_ON_ONCE()
> with a return of -EBUSY, which is often ignored.
> Test of CSD_FLAG_LOCK presence is racy anyway.

If I'm reading this right this is useful for backports but in net-next
it really is a noop? The -EBUSY would be perfectly safe to ignore?
Just checking.
Eric Dumazet May 16, 2022, 6:24 p.m. UTC | #2
On Mon, May 16, 2022 at 11:16 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 15 May 2022 21:24:53 -0700 Eric Dumazet wrote:
> > A cpu can observe sd->defer_count reaching 128,
> > and call smp_call_function_single_async()
> >
> > Problem is that the remote CPU can clear sd->defer_count
> > before the IPI is run/acknowledged.
> >
> > Other cpus can queue more packets and also decide
> > to call smp_call_function_single_async() while the pending
> > IPI was not yet delivered.
> >
> > This is a common issue with smp_call_function_single_async().
> > Callers must ensure correct synchronization and serialization.
> >
> > I triggered this issue while experimenting smaller threshold.
> > Performing the call to smp_call_function_single_async()
> > under sd->defer_lock protection did not solve the problem.
> >
> > Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
> > to insert locked csd") replaced an informative WARN_ON_ONCE()
> > with a return of -EBUSY, which is often ignored.
> > Test of CSD_FLAG_LOCK presence is racy anyway.
>
> If I'm reading this right this is useful for backports but in net-next
> it really is a noop? The -EBUSY would be perfectly safe to ignore?
> Just checking.

Not sure I understand the question.

trigger_rx_softirq() and friends were only in net-next, so there is no
backport needed.

Are you talking of calls from net_rps_send_ipi() ?
These are fine, because we own an atomic bit (NAPI_STATE_SCHED).
Jakub Kicinski May 16, 2022, 6:54 p.m. UTC | #3
On Mon, 16 May 2022 11:24:40 -0700 Eric Dumazet wrote:
> On Mon, May 16, 2022 at 11:16 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > If I'm reading this right this is useful for backports but in net-next
> > it really is a noop? The -EBUSY would be perfectly safe to ignore?
> > Just checking.  
> 
> Not sure I understand the question.
> 
> trigger_rx_softirq() and friends were only in net-next, so there is no
> backport needed.
> 
> Are you talking of calls from net_rps_send_ipi() ?
> These are fine, because we own an atomic bit (NAPI_STATE_SCHED).

Ah, I think I get it now. It was unclear what's the problem this patch
is solving this part of the commit message really is key:

> This is a common issue with smp_call_function_single_async().
> Callers must ensure correct synchronization and serialization.

smp_call_function_single_async() does not protect its own internal state
so we need to wrap it in our own locking (of some form thereof).

Sorry for the noise.
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 536321691c725ebd311088f4654dd04b9abbaaef..89699a299ba1b0b544b7abb782708d827abe55ac 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3127,6 +3127,7 @@  struct softnet_data {
 	/* Another possibly contended cache line */
 	spinlock_t		defer_lock ____cacheline_aligned_in_smp;
 	int			defer_count;
+	int			defer_ipi_scheduled;
 	struct sk_buff		*defer_list;
 	call_single_data_t	defer_csd;
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index a601da3b4a7c800801f763f097f00f3a3b591107..d708f95356e0b03a61e8211adcf6d272dfa322b5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4581,9 +4581,12 @@  static void rps_trigger_softirq(void *data)
 #endif /* CONFIG_RPS */
 
 /* Called from hardirq (IPI) context */
-static void trigger_rx_softirq(void *data __always_unused)
+static void trigger_rx_softirq(void *data)
 {
+	struct softnet_data *sd = data;
+
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	smp_store_release(&sd->defer_ipi_scheduled, 0);
 }
 
 /*
@@ -11381,7 +11384,7 @@  static int __init net_dev_init(void)
 		INIT_CSD(&sd->csd, rps_trigger_softirq, sd);
 		sd->cpu = i;
 #endif
-		INIT_CSD(&sd->defer_csd, trigger_rx_softirq, NULL);
+		INIT_CSD(&sd->defer_csd, trigger_rx_softirq, sd);
 		spin_lock_init(&sd->defer_lock);
 
 		init_gro_hash(&sd->backlog);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bd16e158b3668f496a9d2c8e8b6f3433a326314c..1e2180682f2e94c45e3f26059af6d18be2d9f9d3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6514,8 +6514,7 @@  void skb_attempt_defer_free(struct sk_buff *skb)
 	sd->defer_count++;
 
 	/* kick every time queue length reaches 128.
-	 * This should avoid blocking in smp_call_function_single_async().
-	 * This condition should hardly be bit under normal conditions,
+	 * This condition should hardly be hit under normal conditions,
 	 * unless cpu suddenly stopped to receive NIC interrupts.
 	 */
 	kick = sd->defer_count == 128;
@@ -6525,6 +6524,6 @@  void skb_attempt_defer_free(struct sk_buff *skb)
 	/* Make sure to trigger NET_RX_SOFTIRQ on the remote CPU
 	 * if we are unlucky enough (this seems very unlikely).
 	 */
-	if (unlikely(kick))
+	if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
 		smp_call_function_single_async(cpu, &sd->defer_csd);
 }