From patchwork Fri Dec 15 17:07:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494682 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36C203FE34; Fri, 15 Dec 2023 17:10:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="jKG9NExl"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="DL38CTnF" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b/q97aPaf2TtEboxaC2GbAVq2Ev6772dX7ZG1qr6PDU=; b=jKG9NExlJAUwVurTamQ/Y5uB8afBObOeg/joHUDYSRGRnRYwa7lgFUdmQ0+uYAtN+0VCE8 2p2/M4s4aSIexbGXhUuegc5t1bnNwmC7uFAHUCPJJJM82c0dzhgGda4mXexUlIgWK1K1d+ 5H60g3hC76se2wSZmrNTgrIIQ6rfmzFPtpQ7lX9hz2GJ6BOCK/UmCsWKLeYr/j0ch9RT6u HONWxxGP+4RNGATTUVKpatsI9ZfbOt8rbSe+yRCWzxHNqVLjmGbN67y1qH7EDJkklWykzq Z/1ASP8LDS6FPeMUMHleTPT8kNw4HaYS4AuYqZhSjd88vkVVJ/aG3w+/WEwHig== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b/q97aPaf2TtEboxaC2GbAVq2Ev6772dX7ZG1qr6PDU=; b=DL38CTnF+tPPzN6vDefTDYcyZ2x9OD840MfG+7VuH9RdiQQVB1J2Vora7r5UtZGkprfmog U2jIr+ukC3v/kcDg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock. Date: Fri, 15 Dec 2023 18:07:20 +0100 Message-ID: <20231215171020.687342-2-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Introduce lock guard definition for local_lock_t. There are no users yet. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/local_lock.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index e55010fa73296..706c4b65d9449 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -51,4 +51,15 @@ #define local_unlock_irqrestore(lock, flags) \ __local_unlock_irqrestore(lock, flags) +DEFINE_LOCK_GUARD_1(local_lock, local_lock_t, + local_lock(_T->lock), + local_unlock(_T->lock)) +DEFINE_LOCK_GUARD_1(local_lock_irq, local_lock_t, + local_lock_irq(_T->lock), + local_unlock_irq(_T->lock)) +DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t, + local_lock_irqsave(_T->lock, _T->flags), + local_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + #endif From patchwork Fri Dec 15 17:07:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494684 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 854E03FE2A; Fri, 15 Dec 2023 17:10:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ZY3DOSR5"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="uwH1vhsj" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4hXWC+KQ48HdWABG8tICGwil7n9P1nfuL/9s+0V/nEk=; b=ZY3DOSR51N4hU9NBb4MoCr/OWzpWMWXwCQFexNPapBTZqqNF2KIIZRFSuRTUGfddjVU+qB k7DMO7EfFAkOfhGSKD4sey6cj6mYZ1Mj/Q8o2c/bJH+yG0FUIhjDluzJq8GG+jZE/CW+Hr coXQDRnc/TV+Bvi+34b4Eef8lTMtevJfed6q/fWlGoC2fUzEHJWvineyRtJG3PBZteochR N5plHUqHizXQz7z8I4lq+Qf4H0JW83hrHsg88ybxsssLJPIMuzgr9cbfsLEj+WIeqFtF68 jGXN1AVYdk+p0oJ6kiGF7anFUH7Iz2siXXI0N/oJgf6Tv58rjSIWhPBQRWXxsg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4hXWC+KQ48HdWABG8tICGwil7n9P1nfuL/9s+0V/nEk=; b=uwH1vhsj09Tb2R70oCOs1K5DC7+atNXD5r4Ii29g2VFnHjntTLEOaITCNiXdxZBKy4oRe4 DjOHaUfmhOrgUKBw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure. Date: Fri, 15 Dec 2023 18:07:21 +0100 Message-ID: <20231215171020.687342-3-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add local_lock_nested_bh() locking. It is based on local_lock_t and the naming follows the preempt_disable_nested() example. For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking assumptions based on local_bh_disable(). The macro is optimized away during compilation. For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that BH is disabled. For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU lock. It does not disable CPU migration because it relies on local_bh_disable() disabling CPU migration. With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT. Due to include hell the softirq check has been moved spinlock.c. The intention is to use this locking in places where locking of a per-CPU variable relies on BH being disabled. Instead of treating disabled bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce the locking scope to what actually needs protecting. A side effect is that it also documents the protection scope of the per-CPU variables. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/local_lock.h | 10 ++++++++++ include/linux/local_lock_internal.h | 31 +++++++++++++++++++++++++++++ include/linux/lockdep.h | 3 +++ kernel/locking/spinlock.c | 8 ++++++++ 4 files changed, 52 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 706c4b65d9449..303caac7fbbbc 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -62,4 +62,14 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t, local_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +#define local_lock_nested_bh(_lock) \ + __local_lock_nested_bh(_lock) + +#define local_unlock_nested_bh(_lock) \ + __local_unlock_nested_bh(_lock) + +DEFINE_LOCK_GUARD_1(local_lock_nested_bh, local_lock_t, + local_lock_nested_bh(_T->lock), + local_unlock_nested_bh(_T->lock)) + #endif diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 975e33b793a77..8dd71fbbb6d2b 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -62,6 +62,17 @@ do { \ local_lock_debug_init(lock); \ } while (0) +#define __spinlock_nested_bh_init(lock) \ +do { \ + static struct lock_class_key __key; \ + \ + debug_check_no_locks_freed((void *)lock, sizeof(*lock));\ + lockdep_init_map_type(&(lock)->dep_map, #lock, &__key, \ + 0, LD_WAIT_CONFIG, LD_WAIT_INV, \ + LD_LOCK_NORMAL); \ + local_lock_debug_init(lock); \ +} while (0) + #define __local_lock(lock) \ do { \ preempt_disable(); \ @@ -98,6 +109,15 @@ do { \ local_irq_restore(flags); \ } while (0) +#define __local_lock_nested_bh(lock) \ + do { \ + lockdep_assert_in_softirq(); \ + local_lock_acquire(this_cpu_ptr(lock)); \ + } while (0) + +#define __local_unlock_nested_bh(lock) \ + local_lock_release(this_cpu_ptr(lock)) + #else /* !CONFIG_PREEMPT_RT */ /* @@ -138,4 +158,15 @@ typedef spinlock_t local_lock_t; #define __local_unlock_irqrestore(lock, flags) __local_unlock(lock) +#define __local_lock_nested_bh(lock) \ +do { \ + lockdep_assert_in_softirq_func(); \ + spin_lock(this_cpu_ptr(lock)); \ +} while (0) + +#define __local_unlock_nested_bh(lock) \ +do { \ + spin_unlock(this_cpu_ptr((lock))); \ +} while (0) + #endif /* CONFIG_PREEMPT_RT */ diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index dc2844b071c2c..881732fa0df8c 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -657,6 +657,8 @@ do { \ (!in_softirq() || in_irq() || in_nmi())); \ } while (0) +extern void lockdep_assert_in_softirq_func(void); + #else # define might_lock(lock) do { } while (0) # define might_lock_read(lock) do { } while (0) @@ -670,6 +672,7 @@ do { \ # define lockdep_assert_preemption_enabled() do { } while (0) # define lockdep_assert_preemption_disabled() do { } while (0) # define lockdep_assert_in_softirq() do { } while (0) +# define lockdep_assert_in_softirq_func() do { } while (0) #endif #ifdef CONFIG_PROVE_RAW_LOCK_NESTING diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index 8475a0794f8c5..438c6086d540e 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -413,3 +413,11 @@ notrace int in_lock_functions(unsigned long addr) && addr < (unsigned long)__lock_text_end; } EXPORT_SYMBOL(in_lock_functions); + +#if defined(CONFIG_PROVE_LOCKING) && defined(CONFIG_PREEMPT_RT) +void notrace lockdep_assert_in_softirq_func(void) +{ + lockdep_assert_in_softirq(); +} +EXPORT_SYMBOL(lockdep_assert_in_softirq_func); +#endif From patchwork Fri Dec 15 17:07:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494683 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B27503FE2C; Fri, 15 Dec 2023 17:10:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0PCupFfe"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="OxnZOC1B" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XdH3/zFhbfYvyMpVZfiOEfwuE0dIUmNZkBQapN7l63I=; b=0PCupFfe+/uFey6s8CR4f9MUqgcVNN73B+mT4AjbMmEuUYcfy0bqw87xJkVjLHDvpgVcJ4 d0UMEOxIhSmbQ17Hs0rwZgbWhFGU6wk8ubGb1b1Y4yd3WgdpCpe5ZimbEewJ39BgXCtG99 7Q6ZTf1166uzWAaJ5O/hBXDn2gj1CT0VUBFFeLTNUl+xVZ39K7Dkb+KT6YaxHkpEyFPK9j ioVD6mk99L00DHhoJbOBDHg23kTJFT2vlbVJ7JJBKnU6NSx4X45BRjh8mhO/obj9qYfdpL biNNMp4wsnCRpvPPuKQk4npx92qh4VOztO4PZ5cC2KMjwFZdQWUrhcPJRYyuUg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XdH3/zFhbfYvyMpVZfiOEfwuE0dIUmNZkBQapN7l63I=; b=OxnZOC1BtcPQMgoNzblfsAnPzGRwme2vpzY8QeTyU8THH57SzRl5iNlkGb9sQtRgOLfwyR pvjgxZFCIEjwmSAQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it. Date: Fri, 15 Dec 2023 18:07:22 +0100 Message-ID: <20231215171020.687342-4-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The else condition within __netdev_alloc_frag_align() is an open coded __napi_alloc_frag_align(). Use __napi_alloc_frag_align() instead of open coding it. Signed-off-by: Sebastian Andrzej Siewior --- net/core/skbuff.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b157efea5dea8..de9397e45718a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -311,11 +311,8 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache); - data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + data = __napi_alloc_frag_align(fragsz, align_mask); local_bh_enable(); } return data; From patchwork Fri Dec 15 17:07:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494685 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01F4A3FE3B; Fri, 15 Dec 2023 17:10:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="2Q9uXjXf"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ak6k4VWR" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R5zOPCgrHdEaa1eytpo3QrYdXQW432HPXhv9pq69YVE=; b=2Q9uXjXfe3jw8M/VK8+ax1lt7fif689pkVxd2DnEwor434f6/JoX2unMfkagJudVMlaHu9 OHSHEwrSDJ++81dI3RLyAw+56PQTwNl2rei4JbLgODuhIjfMBm78PZltye7BnXS/PY0r3S +f/jHG2mIwIj/vEHXmr9IGazfRYoG+k/gghNVqE/A7sKNNvMf18IjM/1SfeYJdV+lellcO 1MJ499T3aDBeNZjrGiy/gJDDAt7jvx5nVqD/e34toCcnRYFi0w0NP0v4OiiJ4VOGbjzacz EBEfyOoyqGm+0iF+yqJqSem3aGzYMqX9eu11mp5LNeiS4HsKbSAB3SQuZh31iQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R5zOPCgrHdEaa1eytpo3QrYdXQW432HPXhv9pq69YVE=; b=ak6k4VWRIvYlPdM7jpm3B3Pt21UzdT8y+e34yf7l2u1pvWTHzccCxhsVp8/x8lQiGToHlh MQdt9jY4eq9HyfCA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache. Date: Fri, 15 Dec 2023 18:07:23 +0100 Message-ID: <20231215171020.687342-5-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org napi_alloc_cache is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior --- net/core/skbuff.c | 57 +++++++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 24 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index de9397e45718a..9c3073dcc80f1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -265,6 +265,7 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask) #endif struct napi_alloc_cache { + local_lock_t bh_lock; struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; @@ -272,7 +273,9 @@ struct napi_alloc_cache { }; static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); -static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); +static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; /* Double check that napi_get_frags() allocates skbs with * skb->head being backed by slab, not a page fragment. @@ -296,6 +299,7 @@ void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); fragsz = SKB_DATA_ALIGN(fragsz); + guard(local_lock_nested_bh)(&napi_alloc_cache.bh_lock); return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); } @@ -324,6 +328,7 @@ static struct sk_buff *napi_skb_cache_get(void) struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); struct sk_buff *skb; + guard(local_lock_nested_bh)(&napi_alloc_cache.bh_lock); if (unlikely(!nc->skb_count)) { nc->skb_count = kmem_cache_alloc_bulk(skbuff_cache, GFP_ATOMIC, @@ -726,9 +731,11 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, pfmemalloc = nc->pfmemalloc; } else { local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache.page); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + scoped_guard(local_lock_nested_bh, &napi_alloc_cache.bh_lock) { + nc = this_cpu_ptr(&napi_alloc_cache.page); + data = page_frag_alloc(nc, len, gfp_mask); + pfmemalloc = nc->pfmemalloc; + } local_bh_enable(); } @@ -793,31 +800,32 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, goto skb_success; } - nc = this_cpu_ptr(&napi_alloc_cache); - if (sk_memalloc_socks()) gfp_mask |= __GFP_MEMALLOC; - if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) { - /* we are artificially inflating the allocation size, but - * that is not as bad as it may look like, as: - * - 'len' less than GRO_MAX_HEAD makes little sense - * - On most systems, larger 'len' values lead to fragment - * size above 512 bytes - * - kmalloc would use the kmalloc-1k slab for such values - * - Builds with smaller GRO_MAX_HEAD will very likely do - * little networking, as that implies no WiFi and no - * tunnels support, and 32 bits arches. - */ - len = SZ_1K; + scoped_guard(local_lock_nested_bh, &napi_alloc_cache.bh_lock) { + nc = this_cpu_ptr(&napi_alloc_cache); + if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) { + /* we are artificially inflating the allocation size, but + * that is not as bad as it may look like, as: + * - 'len' less than GRO_MAX_HEAD makes little sense + * - On most systems, larger 'len' values lead to fragment + * size above 512 bytes + * - kmalloc would use the kmalloc-1k slab for such values + * - Builds with smaller GRO_MAX_HEAD will very likely do + * little networking, as that implies no WiFi and no + * tunnels support, and 32 bits arches. + */ + len = SZ_1K; - data = page_frag_alloc_1k(&nc->page_small, gfp_mask); - pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small); - } else { - len = SKB_HEAD_ALIGN(len); + data = page_frag_alloc_1k(&nc->page_small, gfp_mask); + pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small); + } else { + len = SKB_HEAD_ALIGN(len); - data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + data = page_frag_alloc(&nc->page, len, gfp_mask); + pfmemalloc = nc->page.pfmemalloc; + } } if (unlikely(!data)) @@ -1306,6 +1314,7 @@ static void napi_skb_cache_put(struct sk_buff *skb) struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); u32 i; + guard(local_lock_nested_bh)(&napi_alloc_cache.bh_lock); kasan_poison_object_data(skbuff_cache, skb); nc->skb_cache[nc->skb_count++] = skb; From patchwork Fri Dec 15 17:07:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494686 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DA9A4186E; Fri, 15 Dec 2023 17:10:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nATz6yvK"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="KYeTGHJc" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hic3KLXSw0kZLER1121n1f6ruOUri+R8eS2p0xEpoTw=; b=nATz6yvKuLRWZq0Eq+aCX6uBuqNuBm6nccZ6FCvWPUzJPpmeO0DUZXmOdkKfNKVMkI/7WW KM346DHUpBF6toFRMq8TvYdsoHBD3NkGRQT96S12T8xIEcUVNorEzKSjHS6xjGiI0k/xeI pgWoOMYHV9ZZQ8qEZA4+DRbRLaQatC9+J+2atbwcBANgjP/jGG5sb89tZVSMIqQdSKJb4j dbZRxkxZOyt+mnbCDphCKQz3Yakklyo3Zhpre2lidqdQKk9HwCTQg7SCEFZ7VGjsPeGhch /1tV/DJTbkuDjB+JJHuwPy6W+JgixXNQKdAy49xLafU46GhdbtXvioXzZdfqdA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hic3KLXSw0kZLER1121n1f6ruOUri+R8eS2p0xEpoTw=; b=KYeTGHJchiUjRWIVo+kg+HQJVmmf7kR3Nv6Tnfk0U8hVYQ+oRm46QBWuqjYocOtbxQbPPS yWAD9JN/ErMA3OCQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , David Ahern Subject: [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. Date: Fri, 15 Dec 2023 18:07:24 +0100 Message-ID: <20231215171020.687342-6-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org sigpool_scratch is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a pad member (original sigpool_scratch) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern Signed-off-by: Sebastian Andrzej Siewior --- net/ipv4/tcp_sigpool.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_sigpool.c b/net/ipv4/tcp_sigpool.c index 55b310a722c7d..0046d1a0dd796 100644 --- a/net/ipv4/tcp_sigpool.c +++ b/net/ipv4/tcp_sigpool.c @@ -10,7 +10,14 @@ #include static size_t __scratch_size; -static DEFINE_PER_CPU(void __rcu *, sigpool_scratch); +struct sigpool_scratch { + local_lock_t bh_lock; + void __rcu *pad; +}; + +static DEFINE_PER_CPU(struct sigpool_scratch, sigpool_scratch) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; struct sigpool_entry { struct crypto_ahash *hash; @@ -72,7 +79,7 @@ static int sigpool_reserve_scratch(size_t size) break; } - old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch.pad, cpu), scratch, lockdep_is_held(&cpool_mutex)); if (!cpu_online(cpu) || !old_scratch) { kfree(old_scratch); @@ -93,7 +100,7 @@ static void sigpool_scratch_free(void) int cpu; for_each_possible_cpu(cpu) - kfree(rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + kfree(rcu_replace_pointer(per_cpu(sigpool_scratch.pad, cpu), NULL, lockdep_is_held(&cpool_mutex))); __scratch_size = 0; } @@ -278,7 +285,8 @@ int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c) __cond_acquires(RC /* Pairs with tcp_sigpool_reserve_scratch(), scratch area is * valid (allocated) until tcp_sigpool_end(). */ - c->scratch = rcu_dereference_bh(*this_cpu_ptr(&sigpool_scratch)); + local_lock_nested_bh(&sigpool_scratch.bh_lock); + c->scratch = rcu_dereference_bh(*this_cpu_ptr(&sigpool_scratch.pad)); return 0; } EXPORT_SYMBOL_GPL(tcp_sigpool_start); @@ -287,6 +295,7 @@ void tcp_sigpool_end(struct tcp_sigpool *c) __releases(RCU_BH) { struct crypto_ahash *hash = crypto_ahash_reqtfm(c->req); + local_unlock_nested_bh(&sigpool_scratch.bh_lock); rcu_read_unlock_bh(); ahash_request_free(c->req); crypto_free_ahash(hash); From patchwork Fri Dec 15 17:07:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494690 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 684A64187F; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Ayo+EDi5"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="2dLeTH8h" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=daf5a4PohYYdeeZFpg6Z2CxAh8jUeo4C/K4Qpe4O64I=; b=Ayo+EDi5WVEC2oLyPTNKAToccX83oPlWy8QWjqmt1EG27G3lOiITDJxXKIYmP8BtLTfk9g jDq3wQPeS03j80Vi5z9+OI9poqQZhDWDmm2V/xC4pOk7nPE6q+oTY25/ab/aFNr9EBSRzW +zXi8vVFNrEqdXNiOAabOe1kEIAOG75MfHCB6+0t5FlPSDg29BOi/OchHk/vpy6ucseuj+ 8HqTxMwy67MuWt8lPzas3DIh5IX9+I4sdNaUYgml2/Eh2EPBlQtICUwdZLyU40DImADiKX S9iTz6gxS95JYl1aMK2iBZgWIoQSi7HTaIpxTFyQqJYQbBSgQ0LqELnc3KI9Zw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=daf5a4PohYYdeeZFpg6Z2CxAh8jUeo4C/K4Qpe4O64I=; b=2dLeTH8hdXCnGl6QDlIBcjzuoQ63ePDXnfZOVmAarAgpnweapCi8GGwCs18KiklP6EVtVR Nn3BeaSw8j8G6ICg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , David Ahern Subject: [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk. Date: Fri, 15 Dec 2023 18:07:25 +0100 Message-ID: <20231215171020.687342-7-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a sock member (original ipv4_tcp_sk) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern Signed-off-by: Sebastian Andrzej Siewior --- include/net/sock.h | 5 +++++ net/ipv4/tcp_ipv4.c | 15 +++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 1d6931caf0c3c..cb407b7ae39f8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -547,6 +547,11 @@ struct sock { struct hlist_node sk_bind2_node; }; +struct sock_bh_locked { + struct sock *sock; + local_lock_t bh_lock; +}; + enum sk_pacing { SK_PACING_NONE = 0, SK_PACING_NEEDED = 1, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 0c50c5a32b84a..e69dd19f39e02 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -92,7 +92,9 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key, struct inet_hashinfo tcp_hashinfo; EXPORT_SYMBOL(tcp_hashinfo); -static DEFINE_PER_CPU(struct sock *, ipv4_tcp_sk); +static DEFINE_PER_CPU(struct sock_bh_locked, ipv4_tcp_sk) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; static u32 tcp_v4_init_seq(const struct sk_buff *skb) { @@ -878,7 +880,9 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) arg.tos = ip_hdr(skb)->tos; arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(ipv4_tcp_sk); + local_lock_nested_bh(&ipv4_tcp_sk.bh_lock); + ctl_sk = this_cpu_read(ipv4_tcp_sk.sock); + sock_net_set(ctl_sk, net); if (sk) { ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? @@ -903,6 +907,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); + local_unlock_nested_bh(&ipv4_tcp_sk.bh_lock); local_bh_enable(); #ifdef CONFIG_TCP_MD5SIG @@ -998,7 +1003,8 @@ static void tcp_v4_send_ack(const struct sock *sk, arg.tos = tos; arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(ipv4_tcp_sk); + local_lock_nested_bh(&ipv4_tcp_sk.bh_lock); + ctl_sk = this_cpu_read(ipv4_tcp_sk.sock); sock_net_set(ctl_sk, net); ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); @@ -1013,6 +1019,7 @@ static void tcp_v4_send_ack(const struct sock *sk, sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); + local_unlock_nested_bh(&ipv4_tcp_sk.bh_lock); local_bh_enable(); } @@ -3604,7 +3611,7 @@ void __init tcp_v4_init(void) */ inet_sk(sk)->pmtudisc = IP_PMTUDISC_DO; - per_cpu(ipv4_tcp_sk, cpu) = sk; + per_cpu(ipv4_tcp_sk.sock, cpu) = sk; } if (register_pernet_subsys(&tcp_sk_ops)) panic("Failed to create the TCP control socket.\n"); From patchwork Fri Dec 15 17:07:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494687 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69218446AF; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="IVmUifsG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3VHfYBBQ" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vARt+YL97Jryti27DqzXdO+MIl5JkGOA6gpxESYjlfg=; b=IVmUifsGvdEZ8CGV0s9FwkZiqqSybt/c9A41u+Pwr3cbn4oYTXFnP8larPDHyX8kbJc6Vl Gg23hGBj7HwUt1YqRKp3seWlUOrtSBRhDBpb6ydnZqKfgHKPeSgaA5e7W2mHQgFoji379V gkcy1WQDrXWPmIkf6UuG9NtcQUgzkXvs0IRSrQgOR/NXJZ1yTaSGA4ZzDHIR8TShWlspMt R/MdV/QNCSs2T968kCpx9S8uu4Hk07Hga8g7qgf+rAnm3LWDOK7B4aLC+rTELfJ2SoMe9Z 775i1xjsJgIcWEJqQu0Kg1c00ZvUOPTeT+o1usjaPUo/I4EH1M+kUmLD9n3RnQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vARt+YL97Jryti27DqzXdO+MIl5JkGOA6gpxESYjlfg=; b=3VHfYBBQXPCHKXGBFrTfmgQ3OItM0LGx7LIE5jlsybnHdpT5t8UzWo4GBDlg4/BghTs8l+ owNIlXpgsVXnTGBA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Florian Westphal , Jozsef Kadlecsik , Nikolay Aleksandrov , Pablo Neira Ayuso , Roopa Prabhu , bridge@lists.linux.dev, coreteam@netfilter.org, netfilter-devel@vger.kernel.org Subject: [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage. Date: Fri, 15 Dec 2023 18:07:26 +0100 Message-ID: <20231215171020.687342-8-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org brnf_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Florian Westphal Cc: Jozsef Kadlecsik Cc: Nikolay Aleksandrov Cc: Pablo Neira Ayuso Cc: Roopa Prabhu Cc: bridge@lists.linux.dev Cc: coreteam@netfilter.org Cc: netfilter-devel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/bridge/br_netfilter_hooks.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 6adcb45bca75d..c06f498c5f1b3 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -133,6 +133,7 @@ static inline bool is_pppoe_ipv6(const struct sk_buff *skb, #define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN) struct brnf_frag_data { + local_lock_t bh_lock; char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH]; u8 encap_size; u8 size; @@ -140,7 +141,9 @@ struct brnf_frag_data { __be16 vlan_proto; }; -static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage); +static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; static void nf_bridge_info_free(struct sk_buff *skb) { @@ -768,6 +771,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff IPCB(skb)->frag_max_size = nf_bridge->frag_max_size; + guard(local_lock_nested_bh)(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); if (skb_vlan_tag_present(skb)) { @@ -795,6 +799,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff IP6CB(skb)->frag_max_size = nf_bridge->frag_max_size; + guard(local_lock_nested_bh)(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); data->encap_size = nf_bridge_encap_header_len(skb); data->size = ETH_HLEN + data->encap_size; From patchwork Fri Dec 15 17:07:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494688 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 683CB4187D; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="QPPNb1XU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="KDbhOqx2" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vs9x26/0xa1tcBYKbgQAR4XHVyOmk1j4LbahtbGEJ58=; b=QPPNb1XUv1JJZyXkWSJEfN4hs115WhG65vgXF8+eYS2NVCHUiR5bdf+0Te2De9CMkDVhX5 kg+4nxjhReF+U9wcdzwufFTgCeQwQjWs5e258Ldrv1vqqYXu667680eap2RXNzFS97oZ97 Hl80mYjI7MI1SsnnbfyfgeagmgvnlzY0xr283cnFTp45haOfEGAsbdu95tuNqd14AFBlLh IoqH0SrvRHQagmfhbpovRgr2lo+JMaXjSOlPqfVO52gdeCf/JjymwJKCmpvPAqjwoyc52q NsJz+5hhs3fygtd4N4fSYjKv7jyo4nbagYTuZK1T6MBDaNXVxUcOQLb/XpjnNg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vs9x26/0xa1tcBYKbgQAR4XHVyOmk1j4LbahtbGEJ58=; b=KDbhOqx2pyNuV88W7bRpl2paKK1qKcH6+JJCM/aNDU55/cuO1o1cA8YMiXZu0qz812rB0f YrGOeFqrjm1A/SAQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Juri Lelli , Mel Gorman , Steven Rostedt , Valentin Schneider , Vincent Guittot Subject: [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task. Date: Fri, 15 Dec 2023 18:07:27 +0100 Message-ID: <20231215171020.687342-9-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in local_bh_disable() there is no guarantee that only one device is transmitting at a time. With preemption and multiple senders it is possible that the per-CPU recursion counter gets incremented by different threads and exceeds XMIT_RECURSION_LIMIT leading to a false positive recursion alert. Instead of adding a lock to protect the per-CPU variable it is simpler to make the counter per-task. Sending and receiving skbs happens always in thread context anyway. Having a lock to protected the per-CPU counter would block/ serialize two sending threads needlessly. It would also require a recursive lock to ensure that the owner can increment the counter further. Make the recursion counter a task_struct member on PREEMPT_RT. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Juri Lelli Cc: Mel Gorman Cc: Steven Rostedt Cc: Valentin Schneider Cc: Vincent Guittot Signed-off-by: Sebastian Andrzej Siewior --- include/linux/netdevice.h | 30 +++++++++++++++++++++++++++++- include/linux/sched.h | 4 +++- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2564e209465ea..06436695c3679 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3260,7 +3260,9 @@ struct softnet_data { #endif /* written and read only by owning cpu: */ struct { +#ifndef CONFIG_PREEMPT_RT u16 recursion; +#endif u8 more; #ifdef CONFIG_NET_EGRESS u8 skip_txqueue; @@ -3308,12 +3310,36 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd, DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +#define XMIT_RECURSION_LIMIT 8 +#ifdef CONFIG_PREEMPT_RT + +static inline int dev_recursion_level(void) +{ + return current->net_xmit_recursion; +} + +static inline bool dev_xmit_recursion(void) +{ + return unlikely(current->net_xmit_recursion > XMIT_RECURSION_LIMIT); +} + +static inline void dev_xmit_recursion_inc(void) +{ + current->net_xmit_recursion++; +} + +static inline void dev_xmit_recursion_dec(void) +{ + current->net_xmit_recursion--; +} + +#else + static inline int dev_recursion_level(void) { return this_cpu_read(softnet_data.xmit.recursion); } -#define XMIT_RECURSION_LIMIT 8 static inline bool dev_xmit_recursion(void) { return unlikely(__this_cpu_read(softnet_data.xmit.recursion) > @@ -3330,6 +3356,8 @@ static inline void dev_xmit_recursion_dec(void) __this_cpu_dec(softnet_data.xmit.recursion); } +#endif + void __netif_schedule(struct Qdisc *q); void netif_schedule_queue(struct netdev_queue *txq); diff --git a/include/linux/sched.h b/include/linux/sched.h index 292c316972485..26b1b7d674b5d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -964,7 +964,9 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif - +#ifdef CONFIG_PREEMPT_RT + u8 net_xmit_recursion; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ struct restart_block restart_block; From patchwork Fri Dec 15 17:07:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494689 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FB1845BFE; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0nhI3f6g"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qUEfwgSH" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Toc2vtgvKbxP7JJIcFsW6oVPRLsytfDxSXD/Hk+NeWo=; b=0nhI3f6gcBk6fYaPddmLQmpdI1cFuUqcxpgyaiwIbfgYEivTAKDISE/brK9x8nJdgFmqSL 48GKYejHZoXx3++ZtokQT9V86HUYHKla4xynN49GIm15PEOfTlP79N1yveGkcKFDwu7qS/ NsOiMbcdTNU5VEElELKd4MfkNYlt8dy58bM60OFufIEr9I/eygVklupILkhlVeQXFSaGrK SfQl6Z4zBeLK3kq5G0N703Xk1ad+FBMR0isHw6FDVkHt8XZebe5QdgWPtJ6nbIuj0eSKW+ 1iJ/NrCX8CLDT52p239zJLAAR3i+ovLtoFwAfEAE3sNtE3sO3gO7bJjlycs5WA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Toc2vtgvKbxP7JJIcFsW6oVPRLsytfDxSXD/Hk+NeWo=; b=qUEfwgSHsHKNeeTb/p5aBGB0X6bF2Jdfh2GyXwf+Dm+LsQnvt4fu/ceTpV4b99wQNVCSMS h+WORWax33Z78OCQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT. Date: Fri, 15 Dec 2023 18:07:28 +0100 Message-ID: <20231215171020.687342-10-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org There is no additional locking in rps_lock_*() on PREEMPT_RT in the !CONFIG_RPS case. The reasoning is that local_bh_disable() provides the required synchronization (everywhere). All interrupts are threaded and so the required CPU local synchronization is provided by the per-CPU lock in local_bh_disable() which is always performed for the forced-threaded interrupts. Needless to say that the softirq users (NAPI) have BH disabled while being invoked as a softirq callback. Without locking in local_bh_disable() on PREEMPT_RT the softnet_data::input_pkt_queue data structure requires explicit locking in the !CONFIG_RPS case. Disabling interrupts for CPU-local locking is undesired because it will impose restrictions within the locked sections like not allowing to free a skb. Utilise the locking which is present in the CONFIG_RPS case for serialisation on PREEMPT_RT even without CONFIG_RPS enabled. Signed-off-by: Sebastian Andrzej Siewior --- net/core/dev.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index c879246be48d8..09232080843ee 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -220,34 +220,34 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex) static inline void rps_lock_irqsave(struct softnet_data *sd, unsigned long *flags) { - if (IS_ENABLED(CONFIG_RPS)) + if (IS_ENABLED(CONFIG_RPS) || IS_ENABLED(CONFIG_PREEMPT_RT)) spin_lock_irqsave(&sd->input_pkt_queue.lock, *flags); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_save(*flags); } static inline void rps_lock_irq_disable(struct softnet_data *sd) { - if (IS_ENABLED(CONFIG_RPS)) + if (IS_ENABLED(CONFIG_RPS) || IS_ENABLED(CONFIG_PREEMPT_RT)) spin_lock_irq(&sd->input_pkt_queue.lock); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_disable(); } static inline void rps_unlock_irq_restore(struct softnet_data *sd, unsigned long *flags) { - if (IS_ENABLED(CONFIG_RPS)) + if (IS_ENABLED(CONFIG_RPS) || IS_ENABLED(CONFIG_PREEMPT_RT)) spin_unlock_irqrestore(&sd->input_pkt_queue.lock, *flags); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_restore(*flags); } static inline void rps_unlock_irq_enable(struct softnet_data *sd) { - if (IS_ENABLED(CONFIG_RPS)) + if (IS_ENABLED(CONFIG_RPS) || IS_ENABLED(CONFIG_PREEMPT_RT)) spin_unlock_irq(&sd->input_pkt_queue.lock); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_enable(); } From patchwork Fri Dec 15 17:07:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494691 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FE3045BFF; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Z1v8qLCV"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="/c5A367j" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9YY938P41v7tgAkEsD/QoBSvsRcX+2kJhvYsQnrL6Aw=; b=Z1v8qLCV+RLJiu/rblvMSN7tDD7HRrKGxcYCmF6KGyzzTfhi6Q3gCjZ2PZbm0ZRBM/spAv ndtFcpvAwtAlsxq8x5U6GZitY00Ew6iA944vQ8aHBvXItZiqelQzXMTSwrjDPBOKuj8nkH 2iH46n9ufTFdXcQW8jctOXi+eYFnf0iYV5jmePGS2f2zJpj3RbKec60JnNyyiKiw6LeE+M RhEhBWk6qdox8FeyxSq+zREuIVlDQr0ndyST6W3yM0MMWWuDEub7tpHAKxhlVJSANoSltD axHeE592+spSZ/XKkMJkCk7lg4hnarGZb/7qz6bZV8m2T2NJ2VF00LLIcEw2/Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9YY938P41v7tgAkEsD/QoBSvsRcX+2kJhvYsQnrL6Aw=; b=/c5A367jYpvBJ4WRTOzeEQIOJO45jcSIQ5G1gX8QMVs1c5Ii4wETrzynloABIM3EJhrHB8 +pkzwyAzbFNjYMAQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue. Date: Fri, 15 Dec 2023 18:07:29 +0100 Message-ID: <20231215171020.687342-11-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org softnet_data::process_queue is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to softnet_data and use local_lock_nested_bh() for locking of process_queue. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/netdevice.h | 1 + net/core/dev.c | 17 +++++++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 06436695c3679..88e27d3c39da2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3238,6 +3238,7 @@ static inline bool dev_has_header(const struct net_device *dev) struct softnet_data { struct list_head poll_list; struct sk_buff_head process_queue; + local_lock_t process_queue_bh_lock; /* stats */ unsigned int processed; diff --git a/net/core/dev.c b/net/core/dev.c index 09232080843ee..5a0f6da7b3ae5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -440,7 +440,9 @@ static RAW_NOTIFIER_HEAD(netdev_chain); * queue in the local softnet handler. */ -DEFINE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +DEFINE_PER_CPU_ALIGNED(struct softnet_data, softnet_data) = { + .process_queue_bh_lock = INIT_LOCAL_LOCK(process_queue_bh_lock), +}; EXPORT_PER_CPU_SYMBOL(softnet_data); #ifdef CONFIG_LOCKDEP @@ -5838,6 +5840,7 @@ static void flush_backlog(struct work_struct *work) } rps_unlock_irq_enable(sd); + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); skb_queue_walk_safe(&sd->process_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->process_queue); @@ -5845,6 +5848,7 @@ static void flush_backlog(struct work_struct *work) input_queue_head_incr(sd); } } + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); local_bh_enable(); } @@ -5966,15 +5970,22 @@ static int process_backlog(struct napi_struct *napi, int quota) while (again) { struct sk_buff *skb; + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); while ((skb = __skb_dequeue(&sd->process_queue))) { + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); rcu_read_lock(); __netif_receive_skb(skb); rcu_read_unlock(); + + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); input_queue_head_incr(sd); - if (++work >= quota) + if (++work >= quota) { + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); return work; + } } + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); rps_lock_irq_disable(sd); if (skb_queue_empty(&sd->input_pkt_queue)) { @@ -5989,8 +6000,10 @@ static int process_backlog(struct napi_struct *napi, int quota) napi->state = 0; again = false; } else { + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); skb_queue_splice_tail_init(&sd->input_pkt_queue, &sd->process_queue); + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); } rps_unlock_irq_enable(sd); } From patchwork Fri Dec 15 17:07:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494666 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58ACD46426; Fri, 15 Dec 2023 17:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="H1NRUxma"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="tUXBTMv6" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hccXJ/tMSuJL/qI6XmVbVx3Yyp9L6BDQI7DM1y19Tvc=; b=H1NRUxmaM4/3shLHnigobXH0+djwJ1RJjGPcIJDlF0N7BGrPMdnqPLHiELW8Hmlx1noC0i r1wj1L5ePShhZxc82oHWsH6Hw9DhpjoZLDGLaU8lR+VVLxHtWBk9567g8GWAZ/BOAjqTuZ nHIlJlGUKulxv4K5ZmLb4Gk6KO/F8bKZ8ya24TNtsI09icbM/d8III+vyi7PGeQIELym7F wSrjfAj6u3oZXkd3eX3oE+LERBTkt/OC7BBSHGzwEClzZpbU+TzxyLOj/Qt7Uhs4s4Xlnz U/nMajMYyWPC5h66bEfxOJoI4i9s6vEgWoFzcVZHEoGFOD0Kka/Y6RUnRFvwrA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hccXJ/tMSuJL/qI6XmVbVx3Yyp9L6BDQI7DM1y19Tvc=; b=tUXBTMv6dK1TkctQDlxmz7vleUMV7iKdmqarsfLl2XSKRjSuNxKw9tqGuTSp2hU5RMl7XJ IAK0IVGYLLdl/1DA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , bpf@vger.kernel.org Subject: [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF. Date: Fri, 15 Dec 2023 18:07:30 +0100 Message-ID: <20231215171020.687342-12-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org There is no need to explicitly disable migration if bottom halves are also disabled. Disabling BH implies disabling migration. Remove migrate_disable() and rely solely on disabling BH to remain on the same CPU. Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/core/lwt_bpf.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 4a0797f0a154b..a94943681e5aa 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -40,10 +40,9 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, { int ret; - /* Migration disable and BH disable are needed to protect per-cpu - * redirect_info between BPF prog and skb_do_redirect(). + /* Disabling BH is needed to protect per-CPU bpf_redirect_info between + * BPF prog and skb_do_redirect(). */ - migrate_disable(); local_bh_disable(); bpf_compute_data_pointers(skb); ret = bpf_prog_run_save_cb(lwt->prog, skb); @@ -78,7 +77,6 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, } local_bh_enable(); - migrate_enable(); return ret; } From patchwork Fri Dec 15 17:07:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494665 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43E3645C14; Fri, 15 Dec 2023 17:10:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="wJyf/NDW"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="cmcy8elv" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UmaOPiGbylWQAal7j60DGR3u1xvQdt4F7B7ZeO8X65o=; b=wJyf/NDWM6LZXF+SN3FnrxAM8wMdJrYCLdfE3GQzgQoLAYCBLGxYnfA7y6a98ews+nWAWQ X9Vi5ct2UCgOqiC5JHSa4QKcIDZEUZFJQ8MTxq5q0HZ07ZZLQczGlSvY5p3EzVZ0jDzoIT s2Mx2/5F8zzmCsnO5I+VlDYWZ1gC199tfOBd7SKrTog5v/RuRtLik/ZyLwg8NJ3TXtgB9F Qv/7RpoFZP/wwC0uCE7kqpDjOICO4y4MklAklk7/CNgOUmVZZXEAx0HIHXtRMSUWKahVJs c0pq1FHOHLhOs/hpF97wbuFIW72saNBrlzdS3Q1WzoFpmSbqYK8Rx2Ic+dUJUA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UmaOPiGbylWQAal7j60DGR3u1xvQdt4F7B7ZeO8X65o=; b=cmcy8elvBTTwC6ORkXNL1St/NthDPMfHZJaSreDx0FKpAtlL1w6CBi3/HDfJOeqbzDUlYw AQPe/rtYK0rLSdDw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , David Ahern , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states. Date: Fri, 15 Dec 2023 18:07:31 +0100 Message-ID: <20231215171020.687342-13-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() in guard notation for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: David Ahern Cc: Hao Luo Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- include/net/seg6_local.h | 1 + net/core/filter.c | 3 ++ net/ipv6/seg6_local.c | 59 ++++++++++++++++++++++------------------ 3 files changed, 36 insertions(+), 27 deletions(-) diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h index 3fab9dec2ec45..0f22771359f4c 100644 --- a/include/net/seg6_local.h +++ b/include/net/seg6_local.h @@ -20,6 +20,7 @@ extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb); struct seg6_bpf_srh_state { struct ipv6_sr_hdr *srh; + local_lock_t bh_lock; u16 hdrlen; bool valid; }; diff --git a/net/core/filter.c b/net/core/filter.c index 1737884be52f8..c8013f762524b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6384,6 +6384,7 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset, void *srh_tlvs, *srh_end, *ptr; int srhoff = 0; + lockdep_assert_held(&srh_state->bh_lock); if (srh == NULL) return -EINVAL; @@ -6440,6 +6441,7 @@ BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb, int hdroff = 0; int err; + lockdep_assert_held(&srh_state->bh_lock); switch (action) { case SEG6_LOCAL_ACTION_END_X: if (!seg6_bpf_has_valid_srh(skb)) @@ -6516,6 +6518,7 @@ BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset, int srhoff = 0; int ret; + lockdep_assert_held(&srh_state->bh_lock); if (unlikely(srh == NULL)) return -EINVAL; diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 24e2b4b494cb0..ed7278af321a2 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -1380,7 +1380,9 @@ static int input_action_end_b6_encap(struct sk_buff *skb, return err; } -DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states); +DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; bool seg6_bpf_has_valid_srh(struct sk_buff *skb) { @@ -1388,6 +1390,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) this_cpu_ptr(&seg6_bpf_srh_states); struct ipv6_sr_hdr *srh = srh_state->srh; + lockdep_assert_held(&srh_state->bh_lock); if (unlikely(srh == NULL)) return false; @@ -1408,8 +1411,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) static int input_action_end_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt) { - struct seg6_bpf_srh_state *srh_state = - this_cpu_ptr(&seg6_bpf_srh_states); + struct seg6_bpf_srh_state *srh_state; struct ipv6_sr_hdr *srh; int ret; @@ -1420,41 +1422,44 @@ static int input_action_end_bpf(struct sk_buff *skb, } advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - /* preempt_disable is needed to protect the per-CPU buffer srh_state, - * which is also accessed by the bpf_lwt_seg6_* helpers + /* The access to the per-CPU buffer srh_state is protected by running + * always in softirq context (with disabled BH). On PREEMPT_RT the + * required locking is provided by the following local_lock_nested_bh() + * statement. It is also accessed by the bpf_lwt_seg6_* helpers via + * bpf_prog_run_save_cb(). */ - preempt_disable(); - srh_state->srh = srh; - srh_state->hdrlen = srh->hdrlen << 3; - srh_state->valid = true; + scoped_guard(local_lock_nested_bh, &seg6_bpf_srh_states.bh_lock) { + srh_state = this_cpu_ptr(&seg6_bpf_srh_states); + srh_state->srh = srh; + srh_state->hdrlen = srh->hdrlen << 3; + srh_state->valid = true; - rcu_read_lock(); - bpf_compute_data_pointers(skb); - ret = bpf_prog_run_save_cb(slwt->bpf.prog, skb); - rcu_read_unlock(); + rcu_read_lock(); + bpf_compute_data_pointers(skb); + ret = bpf_prog_run_save_cb(slwt->bpf.prog, skb); + rcu_read_unlock(); - switch (ret) { - case BPF_OK: - case BPF_REDIRECT: - break; - case BPF_DROP: - goto drop; - default: - pr_warn_once("bpf-seg6local: Illegal return value %u\n", ret); - goto drop; + switch (ret) { + case BPF_OK: + case BPF_REDIRECT: + break; + case BPF_DROP: + goto drop; + default: + pr_warn_once("bpf-seg6local: Illegal return value %u\n", ret); + goto drop; + } + + if (srh_state->srh && !seg6_bpf_has_valid_srh(skb)) + goto drop; } - if (srh_state->srh && !seg6_bpf_has_valid_srh(skb)) - goto drop; - - preempt_enable(); if (ret != BPF_REDIRECT) seg6_lookup_nexthop(skb, NULL, 0); return dst_input(skb); drop: - preempt_enable(); kfree_skb(skb); return -EINVAL; } From patchwork Fri Dec 15 17:07:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494664 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E658246459; Fri, 15 Dec 2023 17:10:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="i1txzjU2"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qpiqOMh1" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ttIXeZ1ovHzP3s4F4dQhi8p1AikC0DzBNJEl/q3oOFU=; b=i1txzjU2KalZCJ8xYOeccEgh79T53JDMXo6hg9E0Mau+4g4nEzUiD/CU8QtNlcyNoMioNh TxMQ6lCO2yFp+l4aRR42CeLo7AtXdM42ZZzsnfMKblwNsySw00E19iXVw2u5HXIlKMh1od 0Llrgx3TK8reem3ivDEKJy2UXYWWALyMSBMFYs9+zxL/HlBtrzIcUXm5m8ZZ/5YImNN7lO cAGnMTMRhHeXvmYWjgB14T5Fcz6Y7eQAgy4ZqjE08SOVUO1jI2u1fAy/aFqNaScVh8B9jc QCHYZ1d78L1SzqyNvK0hY0llM5vR2g43iaL1X+2sVIpumHkKDsIyh1k5X25prQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ttIXeZ1ovHzP3s4F4dQhi8p1AikC0DzBNJEl/q3oOFU=; b=qpiqOMh1zBcnXXLLLCyeEFb2GN33yYgr8NRbg0Pr11SJ8OU7LfkwcblN3sg2bH4nUYN3Sc jk7theU4aQBFwpBA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad. Date: Fri, 15 Dec 2023 18:07:32 +0100 Message-ID: <20231215171020.687342-14-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org bpf_scratchpad is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Hao Luo Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/core/filter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index c8013f762524b..896aa3fa699f9 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1652,9 +1652,12 @@ struct bpf_scratchpad { __be32 diff[MAX_BPF_STACK / sizeof(__be32)]; u8 buff[MAX_BPF_STACK]; }; + local_lock_t lock; }; -static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); +static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp) = { + .lock = INIT_LOCAL_LOCK(lock), +}; static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) @@ -2023,6 +2026,7 @@ BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size, diff_size > sizeof(sp->diff))) return -EINVAL; + guard(local_lock_nested_bh)(&bpf_sp.lock); for (i = 0; i < from_size / sizeof(__be32); i++, j++) sp->diff[j] = ~from[i]; for (i = 0; i < to_size / sizeof(__be32); i++, j++) From patchwork Fri Dec 15 17:07:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494667 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A59EF4654A; Fri, 15 Dec 2023 17:10:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="HqagrAq6"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="5EEtusZ0" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CX4pb+3godYGbYL2L5DP5LSVnm+uYrRc03oRZsT6xW0=; b=HqagrAq65uQAIAobjbUr4Dx14XchauBivPibNfYj2mQ5AwLbO3ALFUn+JieHhkpzNkVpsa wxETScLpeyAo5czQVKHMZdSwWqVygfFhh9RwnF92sZu0e1qO3ZoDNEIeE2fq2DA5B56cqq CHExNwPEs/JMbQbOjTmCvjRFEOdDfxOcP+q3GFe3PdTnvZvKT5/8eF8IYcwaJOmaDhd2cs muBVpo1ENYqit4LqG7FAoxqsb0osvK99ozR6nykis0OTKU1Vj8weabfIgWmFEAUSrIWDrD 56OzsB21LEdNLlC3ZvA1LH5pRn9/c12AvqybfYZaSw42a9Cm5gmZP7Vo/yCQbA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CX4pb+3godYGbYL2L5DP5LSVnm+uYrRc03oRZsT6xW0=; b=5EEtusZ07Rc3a5s2wX4WniJ1XNm1V1gXAbqutydc+D/8XbjphtsbP1gqjDLuGNtmutCDRS qy1Kwlt1T/tt6kBA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH net-next 14/24] net: Add a lock which held during the redirect process. Date: Fri, 15 Dec 2023 18:07:33 +0100 Message-ID: <20231215171020.687342-15-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Introduce redirect_lock as a lock to be acquired when access to these per-CPU variables is performed. Usually the lock is part of the per-CPU variable which is about to be protected but since there are a few different per-CPU variables which need to be protected at the same time (and some of the variables depend on a CONFIG setting) a new per-CPU data structure with variable bpf_run_lock is used for this. The lock is a nested-BH lock meaning that on non-PREEMPT_RT kernels this simply results in a lockdep check and ensuring that bottom halves are disabled. On PREEMPT_RT kernels this will provide the needed synchronisation once local_bh_disable() does not act as per-CPU lock. This patch introduces the bpf_run_lock.redirect_lock lock. It will be used by drivers in the following patches. A follow-up step could be to keep bpf_prog_run_xdp() and the XDP_REDIRECT switch case (with xdp_do_redirect()) close together. That would allow a single scoped_guard() macro to cover the two required instaces that require locking instead the whole switch case. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- include/linux/bpf.h | 6 ++++++ net/core/filter.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cff5bb08820ec..6912b85209b12 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -291,6 +291,12 @@ struct bpf_map { s64 __percpu *elem_count; }; +struct bpf_run_lock { + local_lock_t redirect_lock; +}; + +DECLARE_PER_CPU(struct bpf_run_lock, bpf_run_lock); + static inline const char *btf_field_type_name(enum btf_field_type type) { switch (type) { diff --git a/net/core/filter.c b/net/core/filter.c index 896aa3fa699f9..7c9653734fb60 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -89,6 +89,11 @@ static const struct bpf_func_proto * bpf_sk_base_func_proto(enum bpf_func_id func_id); +DEFINE_PER_CPU(struct bpf_run_lock, bpf_run_lock) = { + .redirect_lock = INIT_LOCAL_LOCK(redirect_lock), +}; +EXPORT_PER_CPU_SYMBOL_GPL(bpf_run_lock); + int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len) { if (in_compat_syscall()) { From patchwork Fri Dec 15 17:07:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494668 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C733347F70; Fri, 15 Dec 2023 17:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kxpWYJqX"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="4ONn61AS" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=68SK6EotaqjMnRph5nBAaqz3Anc2eC7HqiBtHzcWqsI=; b=kxpWYJqXc+JexNk4+1gG07K3aKtXIaBxWZT8hQxb8qabTV1hHvbxQN1NvlD6oCkgKJmojI Ff39JeNPX68PV4KP1XCMG4apuGbIhq0eQwEd3cgrC2X6wdDo5TqDSQlV+e72oBQSppabUE 2P1lowUHo+2AW8ymIncIxQ0dicnF3T+LfNIqlu/mdkeInfOuVkV5OS54uDG7y7krQk5T8g k3P8WQdLNnnPsoMNQ7Q5zYHusXXC8tETpZ89r1ghuu7i4Xj2UzIWV3B7SUatnqrFCNWJRK 10sc4OFrCP4ud9LW9o49OHgA7kXGf82t+g+uc9lkZVMlY+4rn7og2eJRqZlNBQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=68SK6EotaqjMnRph5nBAaqz3Anc2eC7HqiBtHzcWqsI=; b=4ONn61ASttN9N3mS72Y00QjbBGAkjiduZ/+5Yn3pkVdxQqe1fitnHIMeMzEi4DvsASU3ez aeVXFCXLaAQwS6DQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Cong Wang , Hao Luo , Jamal Hadi Salim , Jesper Dangaard Brouer , Jiri Olsa , Jiri Pirko , John Fastabend , KP Singh , Martin KaFai Lau , Ronak Doshi , Song Liu , Stanislav Fomichev , VMware PV-Drivers Reviewers , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:34 +0100 Message-ID: <20231215171020.687342-16-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Cong Wang Cc: Hao Luo Cc: Jamal Hadi Salim Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: Jiri Pirko Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Ronak Doshi Cc: Song Liu Cc: Stanislav Fomichev Cc: VMware PV-Drivers Reviewers Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/vmxnet3/vmxnet3_xdp.c | 1 + kernel/bpf/cpumap.c | 2 ++ net/bpf/test_run.c | 11 ++++++++--- net/core/dev.c | 3 +++ net/core/filter.c | 1 + net/core/lwt_bpf.c | 2 ++ net/sched/cls_api.c | 2 ++ 7 files changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_xdp.c b/drivers/net/vmxnet3/vmxnet3_xdp.c index 80ddaff759d47..18bce98fd2e31 100644 --- a/drivers/net/vmxnet3/vmxnet3_xdp.c +++ b/drivers/net/vmxnet3/vmxnet3_xdp.c @@ -257,6 +257,7 @@ vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, struct xdp_buff *xdp, u32 act; rq->stats.xdp_packets++; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); page = virt_to_page(xdp->data_hard_start); diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 8a0bb80fe48a3..c26d49bb78679 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -144,6 +144,7 @@ static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu, int err; list_for_each_entry_safe(skb, tmp, listp, list) { + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_generic_xdp(skb, &xdp, rcpu->prog); switch (act) { case XDP_PASS: @@ -182,6 +183,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, struct xdp_buff xdp; int i, nframes = 0; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_set_return_frame_no_direct(); xdp.rxq = &rxq; diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index c9fdcc5cdce10..db8f7eb35c6ca 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -293,6 +293,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, batch_sz = min_t(u32, repeat, xdp->batch_size); local_bh_disable(); + local_lock_nested_bh(&bpf_run_lock.redirect_lock); xdp_set_return_frame_no_direct(); for (i = 0; i < batch_sz; i++) { @@ -348,6 +349,9 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, } out: + xdp_clear_return_frame_no_direct(); + local_unlock_nested_bh(&bpf_run_lock.redirect_lock); + if (redirect) xdp_do_flush(); if (nframes) { @@ -356,7 +360,6 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, err = ret; } - xdp_clear_return_frame_no_direct(); local_bh_enable(); return err; } @@ -417,10 +420,12 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, do { run_ctx.prog_item = &item; local_bh_disable(); - if (xdp) + if (xdp) { + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); *retval = bpf_prog_run_xdp(prog, ctx); - else + } else { *retval = bpf_prog_run(prog, ctx); + } local_bh_enable(); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); diff --git a/net/core/dev.c b/net/core/dev.c index 5a0f6da7b3ae5..5ba7509e88752 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3993,6 +3993,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, *pt_prev = NULL; } + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); qdisc_skb_cb(skb)->pkt_len = skb->len; tcx_set_ingress(skb, true); @@ -4045,6 +4046,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) if (!entry) return skb; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was * already set by the caller. */ @@ -5008,6 +5010,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) u32 act; int err; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); if (act != XDP_PASS) { switch (act) { diff --git a/net/core/filter.c b/net/core/filter.c index 7c9653734fb60..72a7812f933a1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4241,6 +4241,7 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { */ void xdp_do_flush(void) { + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); __dev_flush(); __cpu_map_flush(); __xsk_map_flush(); diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index a94943681e5aa..74b88e897a7e3 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -44,6 +44,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, * BPF prog and skb_do_redirect(). */ local_bh_disable(); + local_lock_nested_bh(&bpf_run_lock.redirect_lock); bpf_compute_data_pointers(skb); ret = bpf_prog_run_save_cb(lwt->prog, skb); @@ -76,6 +77,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, break; } + local_unlock_nested_bh(&bpf_run_lock.redirect_lock); local_bh_enable(); return ret; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 1976bd1639863..da61b99bc558f 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -3925,6 +3926,7 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru fl = rcu_dereference_bh(qe->filter_chain); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); switch (tcf_classify(skb, NULL, fl, &cl_res, false)) { case TC_ACT_SHOT: qdisc_qstats_drop(sch); From patchwork Fri Dec 15 17:07:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494670 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E197547F7F; Fri, 15 Dec 2023 17:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bu//WPoJ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="IFHcLtVI" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+0HJJPub1Rbn3MJWlEsGP0R+kJz2sY8/FL1YHZqJTPQ=; b=bu//WPoJj1s+CyuzY9fCzuWUPNElBTG0mDAue5MtksSZ/nlXc3JYfccn6tG0UR7/s9bLcB TyHwPF9rL6/pk9aAThq84QZ/O0PPk1BIhRiaV+wWrm874nalOaix3eDsDuVspKXyfMpQsX SEP7qczM42BqZeYaUbDx8DFkqux75/XmDcmsMW8XVXUoArSSTH/2/SLlrKU/Y/BeoEFJ1G niUm3L0GNIf2r6tGor3HgSQ2mPyJbN54p8y9ylbMz9svBrqSDE0+Rvfj8QwFQ3UWZybFQ3 9KKmqHpMn8OyR/lDx/Ew+Azv1ohXTln9Yj3vk5ZeyQyuhF5aBTxjwvDJbtZIEw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+0HJJPub1Rbn3MJWlEsGP0R+kJz2sY8/FL1YHZqJTPQ=; b=IFHcLtVItNS1jqIMdMN6dYTNqvyEKzMDmeSXBvDnw0eZeuYwuQGT7alkK3RqsOXKSI1Vp9 o+kMgK4SkJMS6CBw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , "K. Y. Srinivasan" , "Michael S. Tsirkin" , Alexei Starovoitov , Andrii Nakryiko , Dexuan Cui , Haiyang Zhang , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , Juergen Gross , KP Singh , Martin KaFai Lau , Nikolay Aleksandrov , Song Liu , Stanislav Fomichev , Stefano Stabellini , Wei Liu , Willem de Bruijn , Xuan Zhuo , Yonghong Song , bpf@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org Subject: [PATCH net-next 16/24] net: netkit, veth, tun, virt*: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:35 +0100 Message-ID: <20231215171020.687342-17-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: "K. Y. Srinivasan" Cc: "Michael S. Tsirkin" Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Dexuan Cui Cc: Haiyang Zhang Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: Juergen Gross Cc: KP Singh Cc: Martin KaFai Lau Cc: Nikolay Aleksandrov Cc: Song Liu Cc: Stanislav Fomichev Cc: Stefano Stabellini Cc: Wei Liu Cc: Willem de Bruijn Cc: Xuan Zhuo Cc: Yonghong Song Cc: bpf@vger.kernel.org Cc: virtualization@lists.linux.dev Cc: xen-devel@lists.xenproject.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/hyperv/netvsc_bpf.c | 1 + drivers/net/netkit.c | 13 +++++++---- drivers/net/tun.c | 28 +++++++++++++---------- drivers/net/veth.c | 40 ++++++++++++++++++++------------- drivers/net/virtio_net.c | 1 + drivers/net/xen-netfront.c | 1 + 6 files changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c index 4a9522689fa4f..55f8ca92ca199 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -58,6 +58,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, memcpy(xdp->data, data, len); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index 39171380ccf29..fbcf78477bda8 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -80,8 +80,15 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, struct net_device *dev) netkit_prep_forward(skb, !net_eq(dev_net(dev), dev_net(peer))); skb->dev = peer; entry = rcu_dereference(nk->active); - if (entry) - ret = netkit_run(entry, skb, ret); + if (entry) { + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + ret = netkit_run(entry, skb, ret); + if (ret == NETKIT_REDIRECT) { + dev_sw_netstats_tx_add(dev, 1, len); + skb_do_redirect(skb); + } + } + } switch (ret) { case NETKIT_NEXT: case NETKIT_PASS: @@ -95,8 +102,6 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, struct net_device *dev) } break; case NETKIT_REDIRECT: - dev_sw_netstats_tx_add(dev, 1, len); - skb_do_redirect(skb); break; case NETKIT_DROP: default: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afa5497f7c35c..fe0d31f11e4b6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1708,16 +1708,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_init_buff(&xdp, buflen, &tfile->xdp_rxq); xdp_prepare_buff(&xdp, buf, pad, len, false); - act = bpf_prog_run_xdp(xdp_prog, &xdp); - if (act == XDP_REDIRECT || act == XDP_TX) { - get_page(alloc_frag->page); - alloc_frag->offset += buflen; - } - err = tun_xdp_act(tun, xdp_prog, &xdp, act); - if (err < 0) { - if (act == XDP_REDIRECT || act == XDP_TX) - put_page(alloc_frag->page); - goto out; + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, &xdp); + if (act == XDP_REDIRECT || act == XDP_TX) { + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + } + err = tun_xdp_act(tun, xdp_prog, &xdp, act); + if (err < 0) { + if (act == XDP_REDIRECT || act == XDP_TX) + put_page(alloc_frag->page); + goto out; + } } if (err == XDP_REDIRECT) @@ -2460,8 +2462,10 @@ static int tun_xdp_one(struct tun_struct *tun, xdp_init_buff(xdp, buflen, &tfile->xdp_rxq); xdp_set_data_meta_invalid(xdp); - act = bpf_prog_run_xdp(xdp_prog, xdp); - ret = tun_xdp_act(tun, xdp_prog, xdp, act); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, xdp); + ret = tun_xdp_act(tun, xdp_prog, xdp, act); + } if (ret < 0) { put_page(virt_to_head_page(xdp->data)); return ret; diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 977861c46b1fe..c69e5ff9f8795 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -624,7 +624,18 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, xdp->rxq = &rq->xdp_rxq; vxbuf.skb = NULL; - act = bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, xdp); + if (act == XDP_REDIRECT) { + orig_frame = *frame; + xdp->rxq->mem = frame->mem; + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { + frame = &orig_frame; + stats->xdp_drops++; + goto err_xdp; + } + } + } switch (act) { case XDP_PASS: @@ -644,13 +655,6 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - orig_frame = *frame; - xdp->rxq->mem = frame->mem; - if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { - frame = &orig_frame; - stats->rx_drops++; - goto err_xdp; - } stats->xdp_redirect++; rcu_read_unlock(); goto xdp_xmit; @@ -857,7 +861,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, orig_data = xdp->data; orig_data_end = xdp->data_end; - act = bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, xdp); + if (act == XDP_REDIRECT) { + veth_xdp_get(xdp); + consume_skb(skb); + xdp->rxq->mem = rq->xdp_mem; + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { + stats->rx_drops++; + goto err_xdp; + } + } + } switch (act) { case XDP_PASS: @@ -875,13 +890,6 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; - if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { - stats->rx_drops++; - goto err_xdp; - } stats->xdp_redirect++; rcu_read_unlock(); goto xdp_xmit; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d16f592c2061f..5e362c4604239 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1010,6 +1010,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp, int err; u32 act; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xdp); u64_stats_inc(&stats->xdp_packets); diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index ad29f370034e4..e3daa8cdeb84e 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -978,6 +978,7 @@ static u32 xennet_run_xdp(struct netfront_queue *queue, struct page *pdata, xdp_prepare_buff(xdp, page_address(pdata), XDP_PACKET_HEADROOM, len, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_TX: From patchwork Fri Dec 15 17:07:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494692 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A155049F7D; Fri, 15 Dec 2023 17:10:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="U2uSkbD+"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="xR47XjeC" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rzieE1bdCDU7ZEE1pZ3W32UoA/gXNQn5x2kQlc+EJVs=; b=U2uSkbD+NpMBTLSNNot26VG9U32jW+WuGSigX1H2dsAJHh0RiWL9bkRUxHu+zMwWE96uXB GXxqnmpJAn7ZSm6y2Wo8M8d3ookGUh+NWnNtG7PhLOUrF1qovGkgZyFJxniTa08ldaWDT+ KK7rLMb/yr4rw4uYPRonA6KAlxr3S94Rh67FP+QnqeMZknVqukBtyqDpcKymKTh9tDHmTZ 02mF3VFOoBiCHEkVULLrMoST1FTPmR4FhSehWPFG1I1TVQHeLMgNGZw7WDi5G+NqKHsv4H F2PE22hMCiJo3cRTeBfGjqj0ahQcOyrbdJ8qLvv+VjMNgIjN/48OfF3CDSUm5w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rzieE1bdCDU7ZEE1pZ3W32UoA/gXNQn5x2kQlc+EJVs=; b=xR47XjeCsKRff47ftRL3dngUnv52a3CC+b6k684V9u9q0dYWYnMFrl9eQzdyykaw/kwJZc Ot6Y1WRGJvAg6pCA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Arthur Kiyanovski , David Arinzon , Igor Russkikh , Jesper Dangaard Brouer , John Fastabend , Michael Chan , Noam Dagan , Saeed Bishara , Shay Agroskin , Sunil Goutham Subject: [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:36 +0100 Message-ID: <20231215171020.687342-18-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Arthur Kiyanovski Cc: David Arinzon Cc: Igor Russkikh Cc: Jesper Dangaard Brouer Cc: John Fastabend Cc: Michael Chan Cc: Noam Dagan Cc: Saeed Bishara Cc: Shay Agroskin Cc: Sunil Goutham Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + .../net/ethernet/aquantia/atlantic/aq_ring.c | 26 ++++++++++++------- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 1 + .../net/ethernet/cavium/thunder/nicvf_main.c | 3 ++- drivers/net/ethernet/engleder/tsnep_main.c | 17 +++++++----- 5 files changed, 31 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index b5bca48148309..cf075bc5e2b13 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -385,6 +385,7 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) if (!xdp_prog) goto out; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); verdict = bpf_prog_run_xdp(xdp_prog, xdp); switch (verdict) { diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 694daeaf3e615..5d33d478d5109 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -458,7 +458,24 @@ static struct sk_buff *aq_xdp_run_prog(struct aq_nic_s *aq_nic, if (xdp_buff_has_frags(xdp) && !prog->aux->xdp_has_frags) goto out_aborted; + local_lock_nested_bh(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); + if (act == XDP_REDIRECT) { + if (xdp_do_redirect(aq_nic->ndev, xdp, prog) < 0) { + local_unlock_nested_bh(&bpf_run_lock.redirect_lock); + goto out_aborted; + } + local_unlock_nested_bh(&bpf_run_lock.redirect_lock); + + xdp_do_flush(); + u64_stats_update_begin(&rx_ring->stats.rx.syncp); + ++rx_ring->stats.rx.xdp_redirect; + u64_stats_update_end(&rx_ring->stats.rx.syncp); + aq_get_rxpages_xdp(buff, xdp); + } else { + local_unlock_nested_bh(&bpf_run_lock.redirect_lock); + } + switch (act) { case XDP_PASS: skb = aq_xdp_build_skb(xdp, aq_nic->ndev, buff); @@ -481,15 +498,6 @@ static struct sk_buff *aq_xdp_run_prog(struct aq_nic_s *aq_nic, u64_stats_update_end(&rx_ring->stats.rx.syncp); aq_get_rxpages_xdp(buff, xdp); break; - case XDP_REDIRECT: - if (xdp_do_redirect(aq_nic->ndev, xdp, prog) < 0) - goto out_aborted; - xdp_do_flush(); - u64_stats_update_begin(&rx_ring->stats.rx.syncp); - ++rx_ring->stats.rx.xdp_redirect; - u64_stats_update_end(&rx_ring->stats.rx.syncp); - aq_get_rxpages_xdp(buff, xdp); - break; default: fallthrough; case XDP_ABORTED: diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 96f5ca778c67d..c4d989da7fade 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -253,6 +253,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */ orig_data = xdp.data; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, &xdp); tx_avail = bnxt_tx_avail(bp, txr); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index eff350e0bc2a8..8e1406101f71b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -554,7 +554,8 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false); orig_data = xdp.data; - action = bpf_prog_run_xdp(prog, &xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) + action = bpf_prog_run_xdp(prog, &xdp); len = xdp.data_end - xdp.data; /* Check if XDP program has changed headers */ diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index df40c720e7b23..acda3502d274f 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -1268,6 +1268,7 @@ static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog, length = xdp->data_end - xdp->data_hard_start - XDP_PACKET_HEADROOM; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_PASS: @@ -1309,14 +1310,16 @@ static bool tsnep_xdp_run_prog_zc(struct tsnep_rx *rx, struct bpf_prog *prog, { u32 act; - act = bpf_prog_run_xdp(prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(prog, xdp); - /* XDP_REDIRECT is the main action for zero-copy */ - if (likely(act == XDP_REDIRECT)) { - if (xdp_do_redirect(rx->adapter->netdev, xdp, prog) < 0) - goto out_failure; - *status |= TSNEP_XDP_REDIRECT; - return true; + /* XDP_REDIRECT is the main action for zero-copy */ + if (likely(act == XDP_REDIRECT)) { + if (xdp_do_redirect(rx->adapter->netdev, xdp, prog) < 0) + goto out_failure; + *status |= TSNEP_XDP_REDIRECT; + return true; + } } switch (act) { From patchwork Fri Dec 15 17:07:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494669 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2086D56381; Fri, 15 Dec 2023 17:10:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EBbpW7sF"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Kx2HJ8gu" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660240; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y7c3Sac8dmyVQ4hqle9rQ/Ik/z8HPOaNJezqWJL8Lx4=; b=EBbpW7sFFbB1fi7zZJOMAUqG0fNWpdhwl+36f6JGSng5DX9KJ7mc98Fyk438SZJrVm4iUZ E+SHLTy48qAfEBKBD1d9wXOsK8FbeZGL2mbWbt6SdXjnxo2n691PJM1JgDNyR/jJbP+cq+ YSDqoeMtMqK07aMmirlvGm+ltUZdcDVte8qc0vM6fZlFkXvCyTdEWMA5Vm4bit+yg0OSbc Np9oF9XObiwHU0wKk6QMUBgINODSjQiQcLH/5NLDf9MccsV3WqkfDY65FNxafa5nczHAR2 5IH2apHHbikJJ8gIPoCcQlDFV8KsJeq+AZ1wJ3FKwH1cLLO+nT6TrUikc7nbfQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660240; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y7c3Sac8dmyVQ4hqle9rQ/Ik/z8HPOaNJezqWJL8Lx4=; b=Kx2HJ8guXixJQvIThAGi9sONmjTWLdYavjuJOXtz876lDz/n34peXGd9z+zsaDJwBf/tv4 Uvg8tcNSC4J+MYCA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Clark Wang , Claudiu Manoil , Ioana Ciornei , Jesper Dangaard Brouer , John Fastabend , Madalin Bucur , NXP Linux Team , Shenwei Wang , Vladimir Oltean , Wei Fang , bpf@vger.kernel.org Subject: [PATCH net-next 18/24] net: Freescale: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:37 +0100 Message-ID: <20231215171020.687342-19-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Clark Wang Cc: Claudiu Manoil Cc: Ioana Ciornei Cc: Jesper Dangaard Brouer Cc: John Fastabend Cc: Madalin Bucur Cc: NXP Linux Team Cc: Shenwei Wang Cc: Vladimir Oltean Cc: Wei Fang Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- .../net/ethernet/freescale/dpaa/dpaa_eth.c | 1 + .../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 1 + .../net/ethernet/freescale/dpaa2/dpaa2-xsk.c | 30 ++++++++++--------- drivers/net/ethernet/freescale/enetc/enetc.c | 1 + drivers/net/ethernet/freescale/fec_main.c | 1 + 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index dcbc598b11c6c..8adc766282fde 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2597,6 +2597,7 @@ static u32 dpaa_run_xdp(struct dpaa_priv *priv, struct qm_fd *fd, void *vaddr, } #endif + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); /* Update the length and the offset of the FD */ diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 888509cf1f210..08be35a3e3de7 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -442,6 +442,7 @@ static u32 dpaa2_eth_run_xdp(struct dpaa2_eth_priv *priv, xdp_prepare_buff(&xdp, vaddr + offset, XDP_PACKET_HEADROOM, dpaa2_fd_get_len(fd), false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); /* xdp.data pointer may have changed */ diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c index 051748b997f3f..e3ae9de6b0a34 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c @@ -56,23 +56,25 @@ static u32 dpaa2_xsk_run_xdp(struct dpaa2_eth_priv *priv, xdp_buff->rxq = &ch->xdp_rxq; xsk_buff_dma_sync_for_cpu(xdp_buff, ch->xsk_pool); - xdp_act = bpf_prog_run_xdp(xdp_prog, xdp_buff); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + xdp_act = bpf_prog_run_xdp(xdp_prog, xdp_buff); - /* xdp.data pointer may have changed */ - dpaa2_fd_set_offset(fd, xdp_buff->data - vaddr); - dpaa2_fd_set_len(fd, xdp_buff->data_end - xdp_buff->data); + /* xdp.data pointer may have changed */ + dpaa2_fd_set_offset(fd, xdp_buff->data - vaddr); + dpaa2_fd_set_len(fd, xdp_buff->data_end - xdp_buff->data); - if (likely(xdp_act == XDP_REDIRECT)) { - err = xdp_do_redirect(priv->net_dev, xdp_buff, xdp_prog); - if (unlikely(err)) { - ch->stats.xdp_drop++; - dpaa2_eth_recycle_buf(priv, ch, addr); - } else { - ch->buf_count--; - ch->stats.xdp_redirect++; + if (likely(xdp_act == XDP_REDIRECT)) { + err = xdp_do_redirect(priv->net_dev, xdp_buff, xdp_prog); + if (unlikely(err)) { + ch->stats.xdp_drop++; + dpaa2_eth_recycle_buf(priv, ch, addr); + } else { + ch->buf_count--; + ch->stats.xdp_redirect++; + } + + goto xdp_redir; } - - goto xdp_redir; } switch (xdp_act) { diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index cffbf27c4656b..d516b28815af4 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1578,6 +1578,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, rx_byte_cnt += VLAN_HLEN; rx_byte_cnt += xdp_get_buff_len(&xdp_buff); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_act = bpf_prog_run_xdp(prog, &xdp_buff); switch (xdp_act) { diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index c3b7694a74851..335b1e307d468 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -1587,6 +1587,7 @@ fec_enet_run_xdp(struct fec_enet_private *fep, struct bpf_prog *prog, int err; u32 act; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); /* Due xdp_adjust_tail and xdp_adjust_head: DMA sync for_device cover From patchwork Fri Dec 15 17:07:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494671 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACD1B59E59; Fri, 15 Dec 2023 17:10:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="JtWcWrlR"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="NzCo4hAy" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zae3m1cO9fSJoSwUDjCnhvMwaD0Cp7wkXYtacLoccYI=; b=JtWcWrlRr/Yw4kQ1UWySiE1WnFklIBcg9wCnilSSfcJDNnaXHknsfFQ2hm3Wg/ONikWDB+ WRxCE/Tc1oj8/5Vra+uDd80anJKj5JKArhylRiQWeqOiu9dGS2spg5Z1FiJFb7PYir9ZLl YgPOBKKzrgsCa96PnNswic05kIQ/rUSbCdeqFC5thljdEb55zV5ywwfj1sceBkMTCRU0p4 y7xHwZ+cy0+IUt7MuLeHRIi5cN42VqdLlD8SLMT0QybC5HYV7wrPuUzJG43qDI9Xt9mesD 1NDCCGSXDR+g3yntLA8aFfivSR8gLveMOz0FQOgX4TMdKDdRnyy3zgJi+xJuvw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zae3m1cO9fSJoSwUDjCnhvMwaD0Cp7wkXYtacLoccYI=; b=NzCo4hAyNx+4VtNSo05p7oXiKzL7azy7XiXo4Ljk4IPRdRDvvQB9bLy3+o8Xz+b/cRIMks tfLo+tJk/hEPGUCQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , "K. Y. Srinivasan" , Alexei Starovoitov , AngeloGioacchino Del Regno , Dexuan Cui , Dimitris Michailidis , Felix Fietkau , Haiyang Zhang , Horatiu Vultur , Jeroen de Borst , Jesper Dangaard Brouer , John Crispin , John Fastabend , Lorenzo Bianconi , Mark Lee , Matthias Brugger , Praveen Kaligineedi , Sean Wang , Shailend Chand , UNGLinuxDriver@microchip.com, Wei Liu , bpf@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-mediatek@lists.infradead.org Subject: [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:38 +0100 Message-ID: <20231215171020.687342-20-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: "K. Y. Srinivasan" Cc: Alexei Starovoitov Cc: AngeloGioacchino Del Regno Cc: Dexuan Cui Cc: Dimitris Michailidis Cc: Felix Fietkau Cc: Haiyang Zhang Cc: Horatiu Vultur Cc: Jeroen de Borst Cc: Jesper Dangaard Brouer Cc: John Crispin Cc: John Fastabend Cc: Lorenzo Bianconi Cc: Mark Lee Cc: Matthias Brugger Cc: Praveen Kaligineedi Cc: Sean Wang Cc: Shailend Chand Cc: UNGLinuxDriver@microchip.com Cc: Wei Liu Cc: bpf@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: linux-mediatek@lists.infradead.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/fungible/funeth/funeth_rx.c | 1 + drivers/net/ethernet/google/gve/gve_rx.c | 12 +++++++----- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 + drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c | 1 + drivers/net/ethernet/microsoft/mana/mana_bpf.c | 1 + 5 files changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/fungible/funeth/funeth_rx.c b/drivers/net/ethernet/fungible/funeth/funeth_rx.c index 7e2584895de39..e7b1382545908 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_rx.c +++ b/drivers/net/ethernet/fungible/funeth/funeth_rx.c @@ -152,6 +152,7 @@ static void *fun_run_xdp(struct funeth_rxq *q, skb_frag_t *frags, void *buf_va, xdp_prepare_buff(&xdp, buf_va, FUN_XDP_HEADROOM, skb_frag_size(frags) - (FUN_RX_TAILROOM + FUN_XDP_HEADROOM), false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_prog = READ_ONCE(q->xdp_prog); act = bpf_prog_run_xdp(xdp_prog, &xdp); diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index 73655347902d2..504c8ef761a33 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -779,11 +779,13 @@ static void gve_rx(struct gve_rx_ring *rx, netdev_features_t feat, page_info->page_offset, GVE_RX_PAD, len, false); old_data = xdp.data; - xdp_act = bpf_prog_run_xdp(xprog, &xdp); - if (xdp_act != XDP_PASS) { - gve_xdp_done(priv, rx, &xdp, xprog, xdp_act); - ctx->total_size += frag_size; - goto finish_ok_pkt; + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + xdp_act = bpf_prog_run_xdp(xprog, &xdp); + if (xdp_act != XDP_PASS) { + gve_xdp_done(priv, rx, &xdp, xprog, xdp_act); + ctx->total_size += frag_size; + goto finish_ok_pkt; + } } page_info->pad += xdp.data - old_data; diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 3cf6589cfdacf..477a74ee18c0a 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1946,6 +1946,7 @@ static u32 mtk_xdp_run(struct mtk_eth *eth, struct mtk_rx_ring *ring, if (!prog) goto out; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c index 9ee61db8690b4..026311af07f9e 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c @@ -84,6 +84,7 @@ int lan966x_xdp_run(struct lan966x_port *port, struct page *page, u32 data_len) xdp_prepare_buff(&xdp, page_address(page), IFH_LEN_BYTES + XDP_PACKET_HEADROOM, data_len - IFH_LEN_BYTES, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, &xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c index 23b1521c0df96..d465b1dd9fca0 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c +++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c @@ -93,6 +93,7 @@ u32 mana_run_xdp(struct net_device *ndev, struct mana_rxq *rxq, xdp_init_buff(xdp, PAGE_SIZE, &rxq->xdp_rxq); xdp_prepare_buff(xdp, buf_va, XDP_PACKET_HEADROOM, pkt_len, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); rx_stats = &rxq->stats; From patchwork Fri Dec 15 17:07:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494672 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AC1261FD1; Fri, 15 Dec 2023 17:10:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="2frKVfBu"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="XDtMmiv9" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gJvqfL6kfGkcyInYfR34Y5EU2FsYBJr4oMNF68chbko=; b=2frKVfBuTxNCn/YOdDqPDYZ2sctcTtOlZNN7VRsSx5KBQpwgvXQwFEfPLw6meL5XGoMXyu odJ2LoMSj808d8L7BQNEdFJ8NhzX5yli372pUzzryRbWBt4dSR30SMD+us3HkuOsZNtGVl 7SRpX7+5b88RAWlQlXDnwKuY/Zj0drS4EEoNxIUi7s2x65k0c4nlfqbXq5mDT7SzgIUWEe NrokjDvzRvF7PWAG18JNo/ICJZZUWcxpaAbRvrHklPDCkiwtsK3iSqWaTibdYV0L28kr2Q xO7eKDxi7kmQetOaekHkCvzw5cICCTFsegmZRKB+5Q8GsqSAUsK/VJpQkBpBSg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gJvqfL6kfGkcyInYfR34Y5EU2FsYBJr4oMNF68chbko=; b=XDtMmiv9Gk7YQYNN+15d696JweHCRkGPHH6mhMrQi75BId2aTtnMH7bXiiai8t3OnICdC7 mpD1I/ZZkX+/hzDQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Jesper Dangaard Brouer , Jesse Brandeburg , John Fastabend , Tony Nguyen , bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org Subject: [PATCH net-next 20/24] net: intel: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:39 +0100 Message-ID: <20231215171020.687342-21-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Jesper Dangaard Brouer Cc: Jesse Brandeburg Cc: John Fastabend Cc: Tony Nguyen Cc: bpf@vger.kernel.org (open list:XDP Cc: intel-wired-lan@lists.osuosl.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 1 + drivers/net/ethernet/intel/i40e/i40e_xsk.c | 22 +++++++++-------- drivers/net/ethernet/intel/ice/ice_txrx.c | 1 + drivers/net/ethernet/intel/ice/ice_xsk.c | 21 ++++++++-------- drivers/net/ethernet/intel/igb/igb_main.c | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 5 +++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 24 ++++++++++--------- .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ++- 9 files changed, 46 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index dd410b15000f7..76e069ae2183a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2326,6 +2326,7 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp, struct prefetchw(xdp->data_hard_start); /* xdp_frame write */ + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index e99fa854d17f1..2b0c0c1f3ddc8 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -201,17 +201,19 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp, struct i40e_ring *xdp_ring; u32 act; - act = bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, xdp); - if (likely(act == XDP_REDIRECT)) { - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - if (!err) - return I40E_XDP_REDIR; - if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) - result = I40E_XDP_EXIT; - else - result = I40E_XDP_CONSUMED; - goto out_failure; + if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + if (!err) + return I40E_XDP_REDIR; + if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) + result = I40E_XDP_EXIT; + else + result = I40E_XDP_CONSUMED; + goto out_failure; + } } switch (act) { diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 9e97ea8630686..5d4cfa3455b37 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -571,6 +571,7 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, if (!xdp_prog) goto exit; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 99954508184f9..02f89c22d19e3 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -762,17 +762,18 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, int err, result = ICE_XDP_PASS; u32 act; + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { act = bpf_prog_run_xdp(xdp_prog, xdp); - - if (likely(act == XDP_REDIRECT)) { - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - if (!err) - return ICE_XDP_REDIR; - if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) - result = ICE_XDP_EXIT; - else - result = ICE_XDP_CONSUMED; - goto out_failure; + if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + if (!err) + return ICE_XDP_REDIR; + if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) + result = ICE_XDP_EXIT; + else + result = ICE_XDP_CONSUMED; + goto out_failure; + } } switch (act) { diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b2295caa2f0ab..e01be809d030e 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8621,6 +8621,7 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter, prefetchw(xdp->data_hard_start); /* xdp_frame write */ + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index e9bb403bbacf9..8321419b3a307 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2485,7 +2485,10 @@ static int __igc_xdp_run_prog(struct igc_adapter *adapter, struct bpf_prog *prog, struct xdp_buff *xdp) { - u32 act = bpf_prog_run_xdp(prog, xdp); + u32 act; + + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); + act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 94bde2cad0f47..de564e8b83be2 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2203,6 +2203,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, prefetchw(xdp->data_hard_start); /* xdp_frame write */ + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 59798bc33298f..b988f758aad49 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -104,18 +104,20 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, struct xdp_frame *xdpf; u32 act; - xdp_prog = READ_ONCE(rx_ring->xdp_prog); - act = bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + xdp_prog = READ_ONCE(rx_ring->xdp_prog); + act = bpf_prog_run_xdp(xdp_prog, xdp); - if (likely(act == XDP_REDIRECT)) { - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - if (!err) - return IXGBE_XDP_REDIR; - if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) - result = IXGBE_XDP_EXIT; - else - result = IXGBE_XDP_CONSUMED; - goto out_failure; + if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + if (!err) + return IXGBE_XDP_REDIR; + if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) + result = IXGBE_XDP_EXIT; + else + result = IXGBE_XDP_CONSUMED; + goto out_failure; + } } switch (act) { diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index a44e4bd561421..1c58c08aa15ff 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1059,7 +1059,8 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter, if (!xdp_prog) goto xdp_out; - act = bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) + act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: break; From patchwork Fri Dec 15 17:07:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494673 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0DB5675AE; Fri, 15 Dec 2023 17:10:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Nh4vzk05"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="roNUcQ3I" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U5e+RJtEB4QE08KKZsUA35+KJH98yIUgL+9mMUjgvNI=; b=Nh4vzk05UBCwB1CUj6f51oueaIgsBsTvxOYK6NKH13SE2KYU3x9v2KYq+qnd5EZlhngWC3 hB3ML3BTRAZimojQ1/fb25dF0EGX8LUpT/em9vtvhSVqxRBOu5IAs1SJyeF+QZn85OnF11 6ap33KhbA2zAvHg1Xw09C1AUhqYipAMAMG/alXrSoAEcktiilnf2pR1YPjbCKpvM+Ojwhj A2CmjrudLB7y6yWijv3IcZbou/XqxD0mvUko35t5+kkPByLJfiTPbSIypDNNQNsmdXPgcz BJshvpSMzQxVuuYhPIgA5mcZIeZWhVYflCDmIZ+ai/Twf087vm4EyL2bpqz6QQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U5e+RJtEB4QE08KKZsUA35+KJH98yIUgL+9mMUjgvNI=; b=roNUcQ3IAN0GribgixBXQrhWf69moqnLZoY35pCy6QQ57aBDHST2l5JfA+eIb9BvgWuar5 bTAHF6rvXFZU6oBA== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Geetha sowjanya , Jesper Dangaard Brouer , John Fastabend , Marcin Wojtas , Russell King , Subbaraya Sundeep , Sunil Goutham , Thomas Petazzoni , hariprasad , bpf@vger.kernel.org Subject: [PATCH net-next 21/24] net: marvell: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:40 +0100 Message-ID: <20231215171020.687342-22-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Geetha sowjanya Cc: Jesper Dangaard Brouer Cc: John Fastabend Cc: Marcin Wojtas Cc: Russell King Cc: Subbaraya Sundeep Cc: Sunil Goutham Cc: Thomas Petazzoni Cc: hariprasad Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/marvell/mvneta.c | 2 ++ drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 1 + drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c | 1 + 3 files changed, 4 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 29aac327574d6..9c7aacd73b590 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -2263,6 +2263,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction; data_len = xdp->data_end - xdp->data; + + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); /* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */ diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 93137606869e2..3a5524ffaba68 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -3793,6 +3793,7 @@ mvpp2_run_xdp(struct mvpp2_port *port, struct bpf_prog *prog, u32 ret, act; len = xdp->data_end - xdp->data_hard_start - MVPP2_SKB_HEADROOM; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); /* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */ diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c index 4d519ea833b2c..e48e84d6159bc 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c @@ -1422,6 +1422,7 @@ static bool otx2_xdp_rcv_pkt_handler(struct otx2_nic *pfvf, xdp_prepare_buff(&xdp, hard_start, data - hard_start, cqe->sg.seg_size, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, &xdp); switch (act) { From patchwork Fri Dec 15 17:07:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494676 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6581A68EA5; Fri, 15 Dec 2023 17:10:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TYv7u/T4"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="lf1fVdIK" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9BAPa61nrUkvF9kin2QlytisOZ8fHoxcWneIX+0cXsk=; b=TYv7u/T4NUdVyr2Zgb8Yhqj2raw2Fd69uqS3OekrLnSTm+ac88enFS+j3yRgRUIfgED46N MqEWqW1ql/22EbvsSQMma9oZWOpk3zL46jcOsnY1Zi4OpqqHtZxNOCjnQVCneb0kOvZJWl kv5OZ+NXxjQtTphUKIfvGfMXRsklQ1839OFgDyetf4foAsjLHyfWPVgqQCteQeO3KEkwVG 0MtXUNhB2w3kqH8FUvycRlikaVobLDW15Ev3xJDyBgncu8+WSwv4IGvmGhUcBA1LwaXKWR XDDRxC68yYYTneuQgD72rYifAGKX2eBuFhKf63YtwCYbfOM6Zp5YqmN8CdP8Hg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9BAPa61nrUkvF9kin2QlytisOZ8fHoxcWneIX+0cXsk=; b=lf1fVdIK+nJARYZChG8KwxA0xcBHX3Z/QPi8BIqVM2uTB4jFpq4H/uMpDzoDB9LTVW7JRP 9x9KbsNjnlGsjZDg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Edward Cree , Jesper Dangaard Brouer , John Fastabend , Leon Romanovsky , Louis Peens , Martin Habets , Saeed Mahameed , Tariq Toukan , bpf@vger.kernel.org, linux-net-drivers@amd.com, linux-rdma@vger.kernel.org, oss-drivers@corigine.com Subject: [PATCH net-next 22/24] net: mellanox, nfp, sfc: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:41 +0100 Message-ID: <20231215171020.687342-23-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexei Starovoitov Cc: Edward Cree Cc: Jesper Dangaard Brouer Cc: John Fastabend Cc: Leon Romanovsky Cc: Louis Peens Cc: Martin Habets Cc: Saeed Mahameed Cc: Tariq Toukan Cc: bpf@vger.kernel.org Cc: linux-net-drivers@amd.com Cc: linux-rdma@vger.kernel.org Cc: oss-drivers@corigine.com Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 1 + drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 3 ++- drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 1 + drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 3 ++- drivers/net/ethernet/sfc/rx.c | 1 + drivers/net/ethernet/sfc/siena/rx.c | 1 + 7 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index a09b6e05337d9..c0a3ac3405bc5 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -833,6 +833,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud mxbuf.ring = ring; mxbuf.dev = dev; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp); length = mxbuf.xdp.data_end - mxbuf.xdp.data; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index 7decc81ed33a9..b4e3c6a5a6da6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -269,6 +269,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, u32 act; int err; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c index 17381bfc15d72..a041b55514aa3 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c @@ -1011,7 +1011,8 @@ static int nfp_nfd3_rx(struct nfp_net_rx_ring *rx_ring, int budget) pkt_off - NFP_NET_RX_BUF_HEADROOM, pkt_len, true); - act = bpf_prog_run_xdp(xdp_prog, &xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) + act = bpf_prog_run_xdp(xdp_prog, &xdp); pkt_len = xdp.data_end - xdp.data; pkt_off += xdp.data - orig_data; diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c index 45be6954d5aae..38f2d4c2b5b7c 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c @@ -216,6 +216,7 @@ nfp_nfd3_xsk_rx(struct nfp_net_rx_ring *rx_ring, int budget, } } + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(xdp_prog, xrxbuf->xdp); pkt_len = xrxbuf->xdp->data_end - xrxbuf->xdp->data; diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c index 8d78c6faefa8a..af0a36c4fb018 100644 --- a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c @@ -1130,7 +1130,8 @@ static int nfp_nfdk_rx(struct nfp_net_rx_ring *rx_ring, int budget) pkt_off - NFP_NET_RX_BUF_HEADROOM, pkt_len, true); - act = bpf_prog_run_xdp(xdp_prog, &xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) + act = bpf_prog_run_xdp(xdp_prog, &xdp); pkt_len = xdp.data_end - xdp.data; pkt_off += xdp.data - orig_data; diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c index f77a2d3ef37ec..3712d29150af5 100644 --- a/drivers/net/ethernet/sfc/rx.c +++ b/drivers/net/ethernet/sfc/rx.c @@ -291,6 +291,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel, xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM, rx_buf->len, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); offset = (u8 *)xdp.data - *ehp; diff --git a/drivers/net/ethernet/sfc/siena/rx.c b/drivers/net/ethernet/sfc/siena/rx.c index 98d3c0743c0f5..6bfc4cd1c83e0 100644 --- a/drivers/net/ethernet/sfc/siena/rx.c +++ b/drivers/net/ethernet/sfc/siena/rx.c @@ -291,6 +291,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel, xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM, rx_buf->len, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp); offset = (u8 *)xdp.data - *ehp; From patchwork Fri Dec 15 17:07:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494674 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4D5F6A009; Fri, 15 Dec 2023 17:10:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="aUResC8P"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="FII+fmLC" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660243; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Recusd+7IXvA5TirXIVteFQz/CPKIdbiblHLxPiaZxI=; b=aUResC8P3BrfQM2X1JaMPOmerJ/fL+xFTgQ0h+PlYjy/+qzzA8IU4zA12Gt2DftI4dwg90 vDVGZp90lgsRPLWIJaLqvlEIAu3kxfLymGyseqWwuXB9632aXbrkR38hpACqjJj1/riXUX 0A/iuTlDhmkjhXwe1/7n280Ofy2fi2ViRE/SW5jv94TBqf6cioyCWs5oOhSTXAtqekLn/b bTWJbtbdM79zG5ankAAQEbXwm0xwcIXGQ/y6S8d8nP3PKyPNn0u2aoQVKwjPX3nep3lcjd dwFzEnrDiJye3KkRP9dGlHYEWCA71AVfUu/NFmTSzaFF4Y417CqofwsVhqnbXA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660243; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Recusd+7IXvA5TirXIVteFQz/CPKIdbiblHLxPiaZxI=; b=FII+fmLCYFU6/PJPm9PIVesU9ltySEE2RUfVan+2Gga/GlGuHcVIKzoQbPaWAZ6rho2K+9 cvS7J2jl2aks+ACw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexandre Torgue , Alexei Starovoitov , Ariel Elior , Ilias Apalodimas , Jassi Brar , Jesper Dangaard Brouer , John Fastabend , Jose Abreu , Manish Chopra , Maxime Coquelin , Ravi Gunasekaran , Roger Quadros , Siddharth Vadapalli , bpf@vger.kernel.org, linux-omap@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com Subject: [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:42 +0100 Message-ID: <20231215171020.687342-24-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: Alexandre Torgue Cc: Alexei Starovoitov Cc: Ariel Elior Cc: Ilias Apalodimas Cc: Jassi Brar Cc: Jesper Dangaard Brouer Cc: John Fastabend Cc: Jose Abreu Cc: Manish Chopra Cc: Maxime Coquelin Cc: Ravi Gunasekaran Cc: Roger Quadros Cc: Siddharth Vadapalli Cc: bpf@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/ethernet/qlogic/qede/qede_fp.c | 1 + drivers/net/ethernet/socionext/netsec.c | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 + drivers/net/ethernet/ti/cpsw_priv.c | 15 +++++++++------ 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index cb1746bc0e0c5..ce5af094fb817 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -1091,6 +1091,7 @@ static bool qede_rx_xdp(struct qede_dev *edev, xdp_prepare_buff(&xdp, page_address(bd->data), *data_offset, *len, false); + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, &xdp); /* Recalculate, as XDP might have changed the headers */ diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 0891e9e49ecb5..47e314338f3f3 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -905,6 +905,7 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog, int err; u32 act; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); /* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */ diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 37e64283f9107..9e92affc8c22c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4893,6 +4893,7 @@ static int __stmmac_xdp_run_prog(struct stmmac_priv *priv, u32 act; int res; + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_PASS: diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c index 764ed298b5708..f38c49f9fab35 100644 --- a/drivers/net/ethernet/ti/cpsw_priv.c +++ b/drivers/net/ethernet/ti/cpsw_priv.c @@ -1335,9 +1335,15 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp, if (!prog) return CPSW_XDP_PASS; - act = bpf_prog_run_xdp(prog, xdp); - /* XDP prog might have changed packet data and boundaries */ - *len = xdp->data_end - xdp->data; + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act = bpf_prog_run_xdp(prog, xdp); + /* XDP prog might have changed packet data and boundaries */ + *len = xdp->data_end - xdp->data; + if (act == XDP_REDIRECT) { + if (xdp_do_redirect(ndev, xdp, prog)) + goto drop; + } + } switch (act) { case XDP_PASS: @@ -1352,9 +1358,6 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp, xdp_return_frame_rx_napi(xdpf); break; case XDP_REDIRECT: - if (xdp_do_redirect(ndev, xdp, prog)) - goto drop; - /* Have to flush here, per packet, instead of doing it in bulk * at the end of the napi handler. The RX devices on this * particular hardware is sharing a common queue, so the From patchwork Fri Dec 15 17:07:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13494675 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F2816AB83; Fri, 15 Dec 2023 17:10:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="SRSorbBE"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bKkDOU1m" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660243; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wD16ISNyde986dSZRGGV32udEpTGLhg+lI5u4jYKrqw=; b=SRSorbBEjdXTPPsLCYBR3BfHZUztde0viH5AXb6BnrVRO+dRpLMHGwt39krhGG0dIGg2gc U/GPYQC8QIJryAvWdpneFKRQPe6h9ZTqEWdzui8QpFNXWEeO85TAeVO/cv65RLccd4/YRK WzlvjMKRk5knp71bJzfOBYegir5nHiJNyJqFAqoY8QhtCasD/8/viDwUyU06Aoa/DCR6yt /Bo5pe68KaOdpzj5Ej1RmwGXZF6GFCUwJ4j6BVvHO/0/fuqfNBRqPARUavf+TmVV9muEYn DgRMLZwgTHliPnTc22SgXQvtzytCGCuEBcYzPMfFPZaWDtx9BK4682g+Lzyb7w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660243; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wD16ISNyde986dSZRGGV32udEpTGLhg+lI5u4jYKrqw=; b=bKkDOU1m4UEsf1ODMNSff2+rozRNCsjhcGgYFGGAkJKSnk1VerHLACuMKMr3ry8mJ6DtKt 6a997HCgOHDcT6Dw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process. Date: Fri, 15 Dec 2023 18:07:43 +0100 Message-ID: <20231215171020.687342-25-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The users of bpf_redirect_info should lock the access by acquiring the nested BH-lock bpf_run_lock.redirect_lock. This lock should be acquired before the first usage (bpf_prog_run_xdp()) and dropped after the last user in the context (xdp_do_redirect()). Current user in tree have been audited and updated. Add lockdep annonation to ensure new user acquire the lock. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- include/net/xdp.h | 1 + net/core/filter.c | 11 +++++++++++ 2 files changed, 12 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index 349c36fb5fd8f..cdeab175abf18 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -493,6 +493,7 @@ static inline void xdp_clear_features_flag(struct net_device *dev) static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus * under local_bh_disable(), which provides the needed RCU protection * for accessing map entries. diff --git a/net/core/filter.c b/net/core/filter.c index 72a7812f933a1..a2f97503ed578 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2495,6 +2495,7 @@ int skb_do_redirect(struct sk_buff *skb) struct net_device *dev; u32 flags = ri->flags; + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); dev = dev_get_by_index_rcu(net, ri->tgt_index); ri->tgt_index = 0; ri->flags = 0; @@ -2525,6 +2526,8 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); + if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL))) return TC_ACT_SHOT; @@ -2546,6 +2549,8 @@ BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); + if (unlikely(flags)) return TC_ACT_SHOT; @@ -2568,6 +2573,8 @@ BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params, { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); + if (unlikely((plen && plen < sizeof(*params)) || flags)) return TC_ACT_SHOT; @@ -4287,6 +4294,8 @@ u32 xdp_master_redirect(struct xdp_buff *xdp) struct net_device *master, *slave; struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); + master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); if (slave && slave != xdp->rxq->dev) { @@ -4394,6 +4403,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); @@ -4408,6 +4418,7 @@ int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; + lockdep_assert_held(this_cpu_ptr(&bpf_run_lock.redirect_lock)); if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog);