From patchwork Thu Jun 20 13:21:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705499 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79ADC1AD3E2; Thu, 20 Jun 2024 13:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890052; cv=none; b=QLDNXsfY9CG3VdA5u6hnXFHLgK+C7O2tfTSM5czZ82jpe1w6ZUR0LkfCTTVTrEpfzt+4I447PDD4ydU09VnpR0OL7IrAamVpgRwbkCk8SXuK3l1EcKnkISzzqtxs9BYVWNvEEnUkGYREMXH9peUpVOO3kk2qKp17VifztL0ATuc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890052; c=relaxed/simple; bh=JY6ydblJxV3mkgvucOoAjw6D4clChvFMAFGtZ6EWhAg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NBmJkE8Me8OtrHnW6K0RhnFc8Ia1FGx90bqUKTVdoBkZx5DDAVUX/3UMTyd2UsVR0SGKbQCoUzPQj4PtYRxLoqlfsRW9jG3ZrniTdvd3xnMg/VvN/VP9b8oOZXXAbsLSAfJTswVSwcIXWnLoJ3E8xsr+KwjoJXN5Mr4MLmuRePU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=4fse2EfH; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ixLvrWo1; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="4fse2EfH"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ixLvrWo1" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D7uoLiV1k+rI19wogyKAy82/pJLM3IfKwwDbBRQvfd8=; b=4fse2EfH/ZiC7gqmniED7unbyR3Gh1eHLp7RFF4LjG7UYPU2GGyCyLC2C0P9j8uLZy0trX AG6GOHZhywBWZbP82+UPBAgJjcjQcbOWLCNl2Y+zRSPu9t9sW6VLMt/dKU3SNGXbJ0zJl7 iXBLOPFliK3IX++ku5jTLUrmQjycbQI0pS3U0HCevAPUUp5RzZtrU4eSYkKozsfHj71n6R E+115AcDG84TuUzk6/NTLqWH28WMTvDHddd5Bi9J8/eZJhvdk5re2ijdu+3n0qbpwkcQvT bgDxscomc9ME0bjtIU3syOzsS5CX57e8jC+JpZFa2SQkuctLUeAumHYdxTx+Dg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D7uoLiV1k+rI19wogyKAy82/pJLM3IfKwwDbBRQvfd8=; b=ixLvrWo15lUA5oIKJLvFcWtdtGsjlU+jt+E4wPiXd7OszQAkFzYYoljbE4f42G8qLC90C4 fWdD1/TGUNGP3oBg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 01/15] locking/local_lock: Introduce guard definition for local_lock. Date: Thu, 20 Jun 2024 15:21:51 +0200 Message-ID: <20240620132727.660738-2-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Introduce lock guard definition for local_lock_t. There are no users yet. Acked-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- include/linux/local_lock.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index e55010fa73296..82366a37f4474 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -51,4 +51,15 @@ #define local_unlock_irqrestore(lock, flags) \ __local_unlock_irqrestore(lock, flags) +DEFINE_GUARD(local_lock, local_lock_t __percpu*, + local_lock(_T), + local_unlock(_T)) +DEFINE_GUARD(local_lock_irq, local_lock_t __percpu*, + local_lock_irq(_T), + local_unlock_irq(_T)) +DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu, + local_lock_irqsave(_T->lock, _T->flags), + local_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + #endif From patchwork Thu Jun 20 13:21:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705501 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE6A31AD3E7; Thu, 20 Jun 2024 13:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890053; cv=none; b=jBOIrK1J/VRJxolvRkDn/CCGaxgxAcPz+vsqq+I9wfl1e1iXpR9nG76Dh198XI1BC+wLHY+mHORZZWrvvmabMPmTXrY0SwyaNt+2US+ubeFaACvhWlV+6XELZQIPG3moghwlKfDdKs1hkMMSFcV2lUE82sI0AqpvCbqOaYVzQXs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890053; c=relaxed/simple; bh=rj1WNjSAf1uu+c1f+mCIaR8ALhLt13QHvNv1TedOa0o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JOZGSFCekhdX9KY76lh+xCwLztwElr33wMc8xJPbOITp3NK9YBKbwLzb7+Rb2RcwcjnYpfwOq/DQzSNM7FkmOThnaj6r71xwIO3O0qVmnxcKipBu9AI/zj6D/mrO5F+pT1+j2qJ2m9SR2RWFJ0rT8PwNLGP2WdA81RVwV0HjcdY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=MFaxsUzv; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mRYDACKn; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="MFaxsUzv"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mRYDACKn" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=az6/EQSeLReE8i73eLVMMYEC+TyYmsM+NBTkZxvuW7o=; b=MFaxsUzvwzBwfMh8oPRQvVLZdq1uiNFOmgRmbngvWb+aKhSBsMgMlu6cbtxQ+w7tqpe5fw 3EehRQTJmkm8bWJ9YWY1ek1Ys2+/vF4K7IYrjmLQB6y40pPGWAP1/Becz+FB5sxEW2Ots/ Ws8B+3uExsukY9qMf7hXQbUf0XCtmPwSalRLPxZCVVAoQ1YR6N+LDbGR3jW9Wtl4pjXsFS irVQEbmTZcV+2cjnqJrfOKBnQRrnWfsAEu9pFkp4qP3uL2IUwA58WozYFe/wRyp99Dx/kG +AHfYcON183CN71USYudnImSLEbYZmRTQ7ifBaCIwj3lBROkt+2aPQqpM4bv/w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=az6/EQSeLReE8i73eLVMMYEC+TyYmsM+NBTkZxvuW7o=; b=mRYDACKnz6AtMi2aybQYRqn4sZEyIOUQBAAuqRcSBXtiM9+vmd+dKWGBDOnlRSs72D0JDU z1DDL93Tk8zMk8Cg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 02/15] locking/local_lock: Add local nested BH locking infrastructure. Date: Thu, 20 Jun 2024 15:21:52 +0200 Message-ID: <20240620132727.660738-3-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add local_lock_nested_bh() locking. It is based on local_lock_t and the naming follows the preempt_disable_nested() example. For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking assumptions based on local_bh_disable(). The macro is optimized away during compilation. For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that BH is disabled. For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU lock. It does not disable CPU migration because it relies on local_bh_disable() disabling CPU migration. With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT. Due to include hell the softirq check has been moved spinlock.c. The intention is to use this locking in places where locking of a per-CPU variable relies on BH being disabled. Instead of treating disabled bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce the locking scope to what actually needs protecting. A side effect is that it also documents the protection scope of the per-CPU variables. Acked-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- include/linux/local_lock.h | 10 ++++++++++ include/linux/local_lock_internal.h | 31 +++++++++++++++++++++++++++++ include/linux/lockdep.h | 3 +++ kernel/locking/spinlock.c | 8 ++++++++ 4 files changed, 52 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 82366a37f4474..091dc0b6bdfb9 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -62,4 +62,14 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu, local_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +#define local_lock_nested_bh(_lock) \ + __local_lock_nested_bh(_lock) + +#define local_unlock_nested_bh(_lock) \ + __local_unlock_nested_bh(_lock) + +DEFINE_GUARD(local_lock_nested_bh, local_lock_t __percpu*, + local_lock_nested_bh(_T), + local_unlock_nested_bh(_T)) + #endif diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 975e33b793a77..8dd71fbbb6d2b 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -62,6 +62,17 @@ do { \ local_lock_debug_init(lock); \ } while (0) +#define __spinlock_nested_bh_init(lock) \ +do { \ + static struct lock_class_key __key; \ + \ + debug_check_no_locks_freed((void *)lock, sizeof(*lock));\ + lockdep_init_map_type(&(lock)->dep_map, #lock, &__key, \ + 0, LD_WAIT_CONFIG, LD_WAIT_INV, \ + LD_LOCK_NORMAL); \ + local_lock_debug_init(lock); \ +} while (0) + #define __local_lock(lock) \ do { \ preempt_disable(); \ @@ -98,6 +109,15 @@ do { \ local_irq_restore(flags); \ } while (0) +#define __local_lock_nested_bh(lock) \ + do { \ + lockdep_assert_in_softirq(); \ + local_lock_acquire(this_cpu_ptr(lock)); \ + } while (0) + +#define __local_unlock_nested_bh(lock) \ + local_lock_release(this_cpu_ptr(lock)) + #else /* !CONFIG_PREEMPT_RT */ /* @@ -138,4 +158,15 @@ typedef spinlock_t local_lock_t; #define __local_unlock_irqrestore(lock, flags) __local_unlock(lock) +#define __local_lock_nested_bh(lock) \ +do { \ + lockdep_assert_in_softirq_func(); \ + spin_lock(this_cpu_ptr(lock)); \ +} while (0) + +#define __local_unlock_nested_bh(lock) \ +do { \ + spin_unlock(this_cpu_ptr((lock))); \ +} while (0) + #endif /* CONFIG_PREEMPT_RT */ diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 08b0d1d9d78b7..3f5a551579cc9 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -600,6 +600,8 @@ do { \ (!in_softirq() || in_irq() || in_nmi())); \ } while (0) +extern void lockdep_assert_in_softirq_func(void); + #else # define might_lock(lock) do { } while (0) # define might_lock_read(lock) do { } while (0) @@ -613,6 +615,7 @@ do { \ # define lockdep_assert_preemption_enabled() do { } while (0) # define lockdep_assert_preemption_disabled() do { } while (0) # define lockdep_assert_in_softirq() do { } while (0) +# define lockdep_assert_in_softirq_func() do { } while (0) #endif #ifdef CONFIG_PROVE_RAW_LOCK_NESTING diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index 8475a0794f8c5..438c6086d540e 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -413,3 +413,11 @@ notrace int in_lock_functions(unsigned long addr) && addr < (unsigned long)__lock_text_end; } EXPORT_SYMBOL(in_lock_functions); + +#if defined(CONFIG_PROVE_LOCKING) && defined(CONFIG_PREEMPT_RT) +void notrace lockdep_assert_in_softirq_func(void) +{ + lockdep_assert_in_softirq(); +} +EXPORT_SYMBOL(lockdep_assert_in_softirq_func); +#endif From patchwork Thu Jun 20 13:21:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705502 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48F631AD4B2; Thu, 20 Jun 2024 13:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890054; cv=none; b=jXOW72iwuqSFPIR9I3CZZUAbCXa6fL/gsWGmbZy9bHAMUt4sBZB5Z4BRwdC8NIdc1Fjn7TRZDJjowA4KAfflO8b54LmL7bssFREKPKa9KJ+o7Fv/dp8evAmvD6m8oa7BiqnURPE9l9ctVFS2YPhX40Rxaz/uPuG8SHH3BHBYzCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890054; c=relaxed/simple; bh=QtDo2HGf/L9ZrehLlTcmQLPQAAZesi1sG/KKDX+fAmQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NQH2yhNPXmowGRAb6kcnpfp+vx9c6SionuOcf6FvyQVfllgKfBnEpCcQ9gp4nWsWSrwDl7aRrV+HLzZ++zdY02K6lUczpA7DN3KfzxIb4XWqNrQa5C8hxO+HB+XrG6JACNV+AfscEdh4hig/h0z4Lf9beznDf1oiyxrSSjlExzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Iu+BhHR+; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=viOBfUUz; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Iu+BhHR+"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="viOBfUUz" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tfO9IgURCOa8OwNF0E1ITtjXLEOyfn4YupkCPqbnlWQ=; b=Iu+BhHR+exSVGuq/KMmF/Dbbc1HOfqaV3pDO2MeW2207Uxwq9mOl5qRJH/eEQmnz5fsMEO ouIJENjS5V2JTMx2s0wx6veaJ4+2qBZM84OnCsDzf0Mts+kAoiwxGvN//Sl4oLEq3oixPg CCVV4uKLfL2pRylXYub4EfAgGTsFVD5DwEI26eVrHeyqvU5B3S+zpCYhK7qz3SA9wXj879 6Uii07zAGZM5tUbkshq3ENk8WJTKhRHdPkHdFliEZ7p24GmVWzj6+zauDbP6eddkC3bs3o 7EnWVwZFr1EAMR27na7OIWMACflekUZKzGOdOdUl6abICPgj86ZEPxxrXEa7Gw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tfO9IgURCOa8OwNF0E1ITtjXLEOyfn4YupkCPqbnlWQ=; b=viOBfUUz6Inq0vWV+pee4amjwEtJKZexqFS3ZUkr6dzWCjfn3aT1WRHBLAb9z+osGuz2n6 s9A+l5gF0jJ4+ADw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 03/15] net: Use __napi_alloc_frag_align() instead of open coding it. Date: Thu, 20 Jun 2024 15:21:53 +0200 Message-ID: <20240620132727.660738-4-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The else condition within __netdev_alloc_frag_align() is an open coded __napi_alloc_frag_align(). Use __napi_alloc_frag_align() instead of open coding it. Move fragsz assignment before page_frag_alloc_align() invocation because __napi_alloc_frag_align() also contains this statement. Signed-off-by: Sebastian Andrzej Siewior --- net/core/skbuff.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2315c088e91d0..1b52f69ad05e2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -318,19 +318,15 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) { void *data; - fragsz = SKB_DATA_ALIGN(fragsz); if (in_hardirq() || irqs_disabled()) { struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); + fragsz = SKB_DATA_ALIGN(fragsz); data = __page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache); - data = __page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, - align_mask); + data = __napi_alloc_frag_align(fragsz, align_mask); local_bh_enable(); } return data; From patchwork Thu Jun 20 13:21:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705503 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48FAD1AD4B4; Thu, 20 Jun 2024 13:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890055; cv=none; b=cdKPk76irtZBlOfHdYpawPTRak1KxCr73bpxBco6jCG/JnTFKgkRyHOHh8e7fElz84oRYF5+fm9RA4zpHUUWx5k8dq5Se01RBXERxK5RJ20ZxocDFTpAmC+/49p4374yZpAzpq29zRlwHwJJZRO8cJDYH1JMj91Jlt/S75albQg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890055; c=relaxed/simple; bh=TUaiPkkM8WJf7rKO603nNckjg49+kh9f/OI9R0157YE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BEXR7sTRkhlRCLZKgb30KNXBzQdo+ZFdVlDHVMna3DC2mVktDoRR8zrEbLkrfDF1lhVK/RUjzN2j14cl+56XXqNi3GvjNf1tSx2a8Rj1pU5Tvpd5xIt6M7Lu0VPiH+om+EE5YlhAO6pM1firicoUEEar6yAG8KgmYTgaGf+kPy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=XNa4GuHh; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=jT4KPXDZ; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="XNa4GuHh"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="jT4KPXDZ" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QYWIfeqNspAVknrPforo5Z5uQh9q0Xi+NKd9xdScEZI=; b=XNa4GuHhLFSUwkuwk/0MuKgJRCvVvBZpS+mEi/0gKtAqMywpvPLg5S+h5Ilo4yqw/Luf/X K/b2H+t++vVtvJ60NgZrA1bWwX5D0GpidYc/L0/Zr/kgdcD47nqsikCRUFMZARYTYBJOlg PDa0KGUEd2jIQr6ZyCcK5ZJJammy10BZ/sxAB/rvCwM6mjfOW/Pocqty58KBuig1uH3dLf 53FOW6f7AchEq0iSHqC70vLT5qt6KV33Y5Y8A38N1eEPlhskAma35J26gMb63pbKF2CznK WCPzHQ89k0CrLH8mN3WcwFiJGXeNan7MWFvANAifxmCeAwiHs1Hd+msRGN+KZw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QYWIfeqNspAVknrPforo5Z5uQh9q0Xi+NKd9xdScEZI=; b=jT4KPXDZe44bQ9hvZ6q1TwegY8ADVqSOlATplVq32iOM5cbUM0gwYm0dKDYLe+CirH2h19 ysibj4eLa7giO5CQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 04/15] net: Use nested-BH locking for napi_alloc_cache. Date: Thu, 20 Jun 2024 15:21:54 +0200 Message-ID: <20240620132727.660738-5-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org napi_alloc_cache is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior --- net/core/skbuff.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1b52f69ad05e2..eb9a7e65b5c81 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -277,6 +277,7 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask) #endif struct napi_alloc_cache { + local_lock_t bh_lock; struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; @@ -284,7 +285,9 @@ struct napi_alloc_cache { }; static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); -static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); +static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; /* Double check that napi_get_frags() allocates skbs with * skb->head being backed by slab, not a page fragment. @@ -306,11 +309,16 @@ void napi_get_frags_check(struct napi_struct *napi) void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) { struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + void *data; fragsz = SKB_DATA_ALIGN(fragsz); - return __page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, + local_lock_nested_bh(&napi_alloc_cache.bh_lock); + data = __page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); + return data; + } EXPORT_SYMBOL(__napi_alloc_frag_align); @@ -338,16 +346,20 @@ static struct sk_buff *napi_skb_cache_get(void) struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); struct sk_buff *skb; + local_lock_nested_bh(&napi_alloc_cache.bh_lock); if (unlikely(!nc->skb_count)) { nc->skb_count = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, GFP_ATOMIC, NAPI_SKB_CACHE_BULK, nc->skb_cache); - if (unlikely(!nc->skb_count)) + if (unlikely(!nc->skb_count)) { + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); return NULL; + } } skb = nc->skb_cache[--nc->skb_count]; + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); kasan_mempool_unpoison_object(skb, kmem_cache_size(net_hotdata.skbuff_cache)); return skb; @@ -740,9 +752,13 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, pfmemalloc = nc->pfmemalloc; } else { local_bh_disable(); + local_lock_nested_bh(&napi_alloc_cache.bh_lock); + nc = this_cpu_ptr(&napi_alloc_cache.page); data = page_frag_alloc(nc, len, gfp_mask); pfmemalloc = nc->pfmemalloc; + + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); local_bh_enable(); } @@ -806,11 +822,11 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) goto skb_success; } - nc = this_cpu_ptr(&napi_alloc_cache); - if (sk_memalloc_socks()) gfp_mask |= __GFP_MEMALLOC; + local_lock_nested_bh(&napi_alloc_cache.bh_lock); + nc = this_cpu_ptr(&napi_alloc_cache); if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) { /* we are artificially inflating the allocation size, but * that is not as bad as it may look like, as: @@ -832,6 +848,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) data = page_frag_alloc(&nc->page, len, gfp_mask); pfmemalloc = nc->page.pfmemalloc; } + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); if (unlikely(!data)) return NULL; @@ -1431,6 +1448,7 @@ static void napi_skb_cache_put(struct sk_buff *skb) if (!kasan_mempool_poison_object(skb)) return; + local_lock_nested_bh(&napi_alloc_cache.bh_lock); nc->skb_cache[nc->skb_count++] = skb; if (unlikely(nc->skb_count == NAPI_SKB_CACHE_SIZE)) { @@ -1442,6 +1460,7 @@ static void napi_skb_cache_put(struct sk_buff *skb) nc->skb_cache + NAPI_SKB_CACHE_HALF); nc->skb_count = NAPI_SKB_CACHE_HALF; } + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); } void __napi_kfree_skb(struct sk_buff *skb, enum skb_drop_reason reason) From patchwork Thu Jun 20 13:21:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705504 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52E651AD9FE; Thu, 20 Jun 2024 13:27:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890056; cv=none; b=pktgu3Rm07l/Ek1JvsVCq6jhQ0FxGd3Rl3ILJkcy/8tkQklEx/9zV/xn2aSNHeTFjgriyEler8YsIzbZBUFibVhti0UmFlTZ9brEpPmkvDbuwS8vuCpnB2P/7A7mcK2CGKQPUhOYEJ+bPSgcIgBg8g8bsQ7ya8bwjaQFFyCW6+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890056; c=relaxed/simple; bh=3nOCXH40kHfZO87SmAo125DaEM6dygak/4wmXIHT9LM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g/tHQuEmlAzIQfyr7mOzfvfbQy4EMr41fo8iLcMPW+suWkNy7hH5AXGPkAaVGUXxRUlW64s4MEViX1xCSUH32pfh+2XK8rd3kxYYPraLNVoVKhSyEM91qLYS5qI3k5ebWzBYxL/gnejWO4sMhY65uN5Q8WjOf0+tgPnm/ihrPBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=WRwvst7i; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1hOCoxC+; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="WRwvst7i"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1hOCoxC+" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+6ZJx97XD1+mNbkUbduQBaAsweYRZijjNDkdfe4a/D8=; b=WRwvst7igjw+oqXza4+poXOay6yxtULqHXKpREkVOwZzKqb2c+BjWaqAbIwVgm6Nplg6uf Zt3QCJtdL8JD9geDduUxurPeEJ6gshCDIyaRI85+oc5NmITsjy9Zq335mSSn+lBCKuJMcD v1FrTgx0n4iBV8mgVYxk4GZ8lQdovEP25mdS1XGsYf328DFL0ucQA/F9E3zDzoyAqTu1N0 Ff+QDWlgxEWhOH+njVaZ1f86er06UFP8maFIPIKLtTv9xDMxbZBEuWqJ176WeCTSBPizsZ S/hV1GD8EN//okad1w05PzOk5CSAHUKVsklkjhaXKbNlU+fee5nvDwzmnDkJ1g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+6ZJx97XD1+mNbkUbduQBaAsweYRZijjNDkdfe4a/D8=; b=1hOCoxC+I0HRBAzNW5LCsfg0Mpu+I4563svs+h6FK91CdptEUoVdFsB0WMF46nP2Qe4sO8 KwoFaFTLn1cih3Bw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , David Ahern Subject: [PATCH v9 net-next 05/15] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. Date: Thu, 20 Jun 2024 15:21:55 +0200 Message-ID: <20240620132727.660738-6-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org sigpool_scratch is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a pad member (original sigpool_scratch) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern Signed-off-by: Sebastian Andrzej Siewior --- net/ipv4/tcp_sigpool.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_sigpool.c b/net/ipv4/tcp_sigpool.c index 8512cb09ebc09..d8a4f192873a2 100644 --- a/net/ipv4/tcp_sigpool.c +++ b/net/ipv4/tcp_sigpool.c @@ -10,7 +10,14 @@ #include static size_t __scratch_size; -static DEFINE_PER_CPU(void __rcu *, sigpool_scratch); +struct sigpool_scratch { + local_lock_t bh_lock; + void __rcu *pad; +}; + +static DEFINE_PER_CPU(struct sigpool_scratch, sigpool_scratch) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; struct sigpool_entry { struct crypto_ahash *hash; @@ -72,7 +79,7 @@ static int sigpool_reserve_scratch(size_t size) break; } - old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch.pad, cpu), scratch, lockdep_is_held(&cpool_mutex)); if (!cpu_online(cpu) || !old_scratch) { kfree(old_scratch); @@ -93,7 +100,7 @@ static void sigpool_scratch_free(void) int cpu; for_each_possible_cpu(cpu) - kfree(rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + kfree(rcu_replace_pointer(per_cpu(sigpool_scratch.pad, cpu), NULL, lockdep_is_held(&cpool_mutex))); __scratch_size = 0; } @@ -277,7 +284,8 @@ int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c) __cond_acquires(RC /* Pairs with tcp_sigpool_reserve_scratch(), scratch area is * valid (allocated) until tcp_sigpool_end(). */ - c->scratch = rcu_dereference_bh(*this_cpu_ptr(&sigpool_scratch)); + local_lock_nested_bh(&sigpool_scratch.bh_lock); + c->scratch = rcu_dereference_bh(*this_cpu_ptr(&sigpool_scratch.pad)); return 0; } EXPORT_SYMBOL_GPL(tcp_sigpool_start); @@ -286,6 +294,7 @@ void tcp_sigpool_end(struct tcp_sigpool *c) __releases(RCU_BH) { struct crypto_ahash *hash = crypto_ahash_reqtfm(c->req); + local_unlock_nested_bh(&sigpool_scratch.bh_lock); rcu_read_unlock_bh(); ahash_request_free(c->req); crypto_free_ahash(hash); From patchwork Thu Jun 20 13:21:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705505 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A6851AE08A; Thu, 20 Jun 2024 13:27:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890056; cv=none; b=hrf8FVZLtijxQFLDAdpF2s8l5BRri8CDIKxU1uvGJEonO4Ql5BhshPNwIzkUE4I+cNTl1yBVxsGW/OedY0tpHWKMsMUda26sAELu/MmPjFnc1hdNCyKEDaWTbFxrErUj1+EHgo1Gv8GKzdXcEFdi/7MrZ+/uxakbT9erTXDMmhc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890056; c=relaxed/simple; bh=zooFG+3h3Bi3yzwkYk4vvzS1nkd/usZuqbMKVmWq4A8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M0ZMzoV57cZgodOLDaAwjFto917mySJo6v8vrrCJVdeysFDKa4822Y5uvh0lHLP+bWWYPxAjb8KXYyQJMRA+oYbpBvCuPtlv4REaC5BeEn0CwUzjUtdahqGK2kxdEDPHZl8dA9kBQzVZVYtGjpTELwJBDh8CJM5lwM1h2xh10go= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=PzP/Obpt; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=K4/fVOGd; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="PzP/Obpt"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="K4/fVOGd" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HEsAYxN7IhOZ2aMDFd8MWCJ5anszYr7hLYaSFKTM64I=; b=PzP/ObptVliAGA2blmp4Cn1S0NLLk/mHWZqH+Hi/Ju9u6d6koCDLWDVOninx5XJi1PiPuN wqrj4mOfN6/FuPtnhH/WzgA1EZDjKHZhrJmGD4qk+ZpZVy8gq1cO5eYxLlQz/NFc0XpTUF fBliBXIVj+8z7EwkYIWRl10al8w4eONvdopMCaTCZ+U3CXnxePXd95u25mWK4FmyIPlwNG NM5adxUFrhIuIsgvZZ8KpRdM/WhU/3HWAiIJ0mP1IsGHxiqXoVt+BoXqnQhCYk4oxbDMAb Zr8AAI7s++0bdk6Le2K0g5SUjpp803F2ISt7Us+rsWJhzl1s//MeqdM7ZoSEOg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HEsAYxN7IhOZ2aMDFd8MWCJ5anszYr7hLYaSFKTM64I=; b=K4/fVOGdA/hAw2WI0HAiL8+OBZ+7Px56QKrznHbxQ4lkFxBGmN09k21A0/oFefa9UzleZI r0npD6EoIcwrcTAg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , David Ahern Subject: [PATCH v9 net-next 06/15] net/ipv4: Use nested-BH locking for ipv4_tcp_sk. Date: Thu, 20 Jun 2024 15:21:56 +0200 Message-ID: <20240620132727.660738-7-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a sock member (original ipv4_tcp_sk) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern Signed-off-by: Sebastian Andrzej Siewior --- include/net/sock.h | 5 +++++ net/ipv4/tcp_ipv4.c | 15 +++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b30ea0c342a65..cce23ac4d5148 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -544,6 +544,11 @@ struct sock { netns_tracker ns_tracker; }; +struct sock_bh_locked { + struct sock *sock; + local_lock_t bh_lock; +}; + enum sk_pacing { SK_PACING_NONE = 0, SK_PACING_NEEDED = 1, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8e49d69279d57..fd17f25ff288a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -93,7 +93,9 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key, struct inet_hashinfo tcp_hashinfo; EXPORT_SYMBOL(tcp_hashinfo); -static DEFINE_PER_CPU(struct sock *, ipv4_tcp_sk); +static DEFINE_PER_CPU(struct sock_bh_locked, ipv4_tcp_sk) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; static u32 tcp_v4_init_seq(const struct sk_buff *skb) { @@ -882,7 +884,9 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb, arg.tos = ip_hdr(skb)->tos; arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(ipv4_tcp_sk); + local_lock_nested_bh(&ipv4_tcp_sk.bh_lock); + ctl_sk = this_cpu_read(ipv4_tcp_sk.sock); + sock_net_set(ctl_sk, net); if (sk) { ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? @@ -907,6 +911,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb, sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); + local_unlock_nested_bh(&ipv4_tcp_sk.bh_lock); local_bh_enable(); #ifdef CONFIG_TCP_MD5SIG @@ -1002,7 +1007,8 @@ static void tcp_v4_send_ack(const struct sock *sk, arg.tos = tos; arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(ipv4_tcp_sk); + local_lock_nested_bh(&ipv4_tcp_sk.bh_lock); + ctl_sk = this_cpu_read(ipv4_tcp_sk.sock); sock_net_set(ctl_sk, net); ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); @@ -1017,6 +1023,7 @@ static void tcp_v4_send_ack(const struct sock *sk, sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); + local_unlock_nested_bh(&ipv4_tcp_sk.bh_lock); local_bh_enable(); } @@ -3615,7 +3622,7 @@ void __init tcp_v4_init(void) sk->sk_clockid = CLOCK_MONOTONIC; - per_cpu(ipv4_tcp_sk, cpu) = sk; + per_cpu(ipv4_tcp_sk.sock, cpu) = sk; } if (register_pernet_subsys(&tcp_sk_ops)) panic("Failed to create the TCP control socket.\n"); From patchwork Thu Jun 20 13:21:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705508 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC93A1AE091; Thu, 20 Jun 2024 13:27:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; cv=none; b=ImPNTBePnMbW9HJS/n7WZGtl+ciZnmD8yIO9w95bPx8Q/lIDpf3OhiRgdh+weRiUNesFXrqWwZXrZ+Azm0F2bos8T7AW4dxQw575FXj/g/o1zbtFi+wbQhAtIHcSMVmYJI3ouup7iz0ia5UorWzgS7+2CONKkHVxwruKodDF/BE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; c=relaxed/simple; bh=hVYiIFL5HwJaYF05A9M5SqbFcZHu2+UJJv8+B5zsSCs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s6dKZt+1+c9NOxdmsJh5EEyc/q0iiEJhO5uA/puM5HQm0yLKD7ehra9WvxsKdLuN5GdE5xVzWZZgHw2waq6nUeBwZ88IR4Hyhl7gaYCcKqO3mRkyx/C+iiV4YK4MU4xlhAKBtA8s8H/ZEjikA2nEW7vr99olME0NeEaZ6/588hk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=bxANh1gS; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=uVtEEhpC; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bxANh1gS"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="uVtEEhpC" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Fimo9A+L9/O8JEt6UHonD1Uw2oHXu2TNi4Wzvh1cs4=; b=bxANh1gSjfTkjb1of2CTKsIDUvOSDJAUbHfCfy/FvFyth92S9dfVSoH+WCSmTuXjWvzwxt DACjKXQQ1VYIUIQsnFbUyv6Bv02RdibulAhl5hLFmhtPoWpuNMHDbM3PY6SEOLQCzu3BdP wcmeopCIlOo7aABqL85HqI1wXNxgFCEXQO4ydrz6oQea4aN8v0kTayekSBejrLDh32NKR2 0kMU3y7jT7N5LYd4y+g//zA4MQLR1IN7pVprYh7uo6uiVY3wYQjOIcE+A8Ct4sXoGIfKIw max+JfgnHByESacJAxA4OPs+jTBItb+RxU4q/fjG18Wftp4Y/m5tnaKLiEGgMw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Fimo9A+L9/O8JEt6UHonD1Uw2oHXu2TNi4Wzvh1cs4=; b=uVtEEhpCVzejJ054Aym774dUIX5mpnUiN6B9bHxjzorpiAOft8TrsEotoN0TfIIq5ejLnp XLqSEigg6nf9VIBw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Florian Westphal , Jozsef Kadlecsik , Nikolay Aleksandrov , Pablo Neira Ayuso , Roopa Prabhu , bridge@lists.linux.dev, coreteam@netfilter.org, netfilter-devel@vger.kernel.org Subject: [PATCH v9 net-next 07/15] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage. Date: Thu, 20 Jun 2024 15:21:57 +0200 Message-ID: <20240620132727.660738-8-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org brnf_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Florian Westphal Cc: Jozsef Kadlecsik Cc: Nikolay Aleksandrov Cc: Pablo Neira Ayuso Cc: Roopa Prabhu Cc: bridge@lists.linux.dev Cc: coreteam@netfilter.org Cc: netfilter-devel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/bridge/br_netfilter_hooks.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index bf30c50b56895..3c9f6538990ea 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -137,6 +137,7 @@ static inline bool is_pppoe_ipv6(const struct sk_buff *skb, #define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN) struct brnf_frag_data { + local_lock_t bh_lock; char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH]; u8 encap_size; u8 size; @@ -144,7 +145,9 @@ struct brnf_frag_data { __be16 vlan_proto; }; -static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage); +static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; static void nf_bridge_info_free(struct sk_buff *skb) { @@ -850,6 +853,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); unsigned int mtu, mtu_reserved; + int ret; mtu_reserved = nf_bridge_mtu_reduction(skb); mtu = skb->dev->mtu; @@ -882,6 +886,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff IPCB(skb)->frag_max_size = nf_bridge->frag_max_size; + local_lock_nested_bh(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); if (skb_vlan_tag_present(skb)) { @@ -897,7 +902,9 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff skb_copy_from_linear_data_offset(skb, -data->size, data->mac, data->size); - return br_nf_ip_fragment(net, sk, skb, br_nf_push_frag_xmit); + ret = br_nf_ip_fragment(net, sk, skb, br_nf_push_frag_xmit); + local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); + return ret; } if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { @@ -909,6 +916,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff IP6CB(skb)->frag_max_size = nf_bridge->frag_max_size; + local_lock_nested_bh(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); data->encap_size = nf_bridge_encap_header_len(skb); data->size = ETH_HLEN + data->encap_size; @@ -916,8 +924,12 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff skb_copy_from_linear_data_offset(skb, -data->size, data->mac, data->size); - if (v6ops) - return v6ops->fragment(net, sk, skb, br_nf_push_frag_xmit); + if (v6ops) { + ret = v6ops->fragment(net, sk, skb, br_nf_push_frag_xmit); + local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); + return ret; + } + local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); kfree_skb(skb); return -EMSGSIZE; From patchwork Thu Jun 20 13:21:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705507 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 357141B0133; Thu, 20 Jun 2024 13:27:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; cv=none; b=mjcrOQTwYAgjfyoNy+PmDUFaS6N6R9GUWD/qwW1NcRsA+JXHHV5MAu1aEbqRBMc7Cmd6HoSxzoxZV838xgE6NK2VKlHU8qdmFIPOlRdVVc3EvQsPyme3dmQuozNrZvQw7vgaE4MGDV/hE8iYTKaYxMTI1c8qok00eJFDUPOA8Ko= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; c=relaxed/simple; bh=1vLGuywkKXgAITcXqnFVZDkwztcMBbiNTnJd4tFAyYw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YtIJCuELh5TWo+rDk1kmK73j9O7HoKaA8OPSSsf/wzIHFymdtAIcdqsaEgCjQb7MOwe/bUEPv8xow3zM24Qk7kaAVQyeYFjuy4KGvjF5EXCEayP0fbjjVCXNWT2lq5RbiH01KS9hyBjLAXlJ5HwsXp2NbrpSZeFMRN3KIZq0qk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mDck/gTw; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=twD7uYx2; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mDck/gTw"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="twD7uYx2" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bhjueB/Zw+st/p0jwSxRVntltx2CtPizCDSJU2xaKnU=; b=mDck/gTw1kmE3u0xoNApi6XY6MJ++cp/Ik/QpcT+id68ciyDT/6m2MwV2g9lGI3c6Ytm57 HZwQ3azh46DKpuuniM3PNH/oomMAt3f8xcd/5WihIq125ZdsYkgH4vkzm/S8HM32iKSnjv 73eqY+oAgmN49xny1OAFHRxU0jZHcQ2ln1u8SeFaDjXQsmfg9ush2V7yu8YKybaGS1Po7K 79mpjFtrOUnbe9Kga3LsOBS6kk1ct9jzoNwnwagHZEBHPoPZvbrwg6y24aDtqtB2VW/9K/ oAFPSc36OT9mnR65w0SqLhhUP94BX1PHoDxCLgwPvCyBIZPkZ1+xQMK6QF9i4g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bhjueB/Zw+st/p0jwSxRVntltx2CtPizCDSJU2xaKnU=; b=twD7uYx2DmX6vPUj2NNU3bmMvuvaKQQSYDtHmVVnsxHTMCkJ3UUhaW3NMPgMd768wPC8cR OqlarP5iwZ+iuOAQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Juri Lelli , Mel Gorman , Steven Rostedt , Valentin Schneider , Vincent Guittot Subject: [PATCH v9 net-next 08/15] net: softnet_data: Make xmit per task. Date: Thu, 20 Jun 2024 15:21:58 +0200 Message-ID: <20240620132727.660738-9-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in local_bh_disable() there is no guarantee that only one device is transmitting at a time. With preemption and multiple senders it is possible that the per-CPU `recursion' counter gets incremented by different threads and exceeds XMIT_RECURSION_LIMIT leading to a false positive recursion alert. The `more' member is subject to similar problems if set by one thread for one driver and wrongly used by another driver within another thread. Instead of adding a lock to protect the per-CPU variable it is simpler to make xmit per-task. Sending and receiving skbs happens always in thread context anyway. Having a lock to protected the per-CPU counter would block/ serialize two sending threads needlessly. It would also require a recursive lock to ensure that the owner can increment the counter further. Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add needed wrapper. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Juri Lelli Cc: Mel Gorman Cc: Steven Rostedt Cc: Valentin Schneider Cc: Vincent Guittot Signed-off-by: Sebastian Andrzej Siewior --- include/linux/netdevice.h | 42 +++++++++++++++++++++++++--------- include/linux/netdevice_xmit.h | 13 +++++++++++ include/linux/sched.h | 5 +++- net/core/dev.c | 14 ++++++++++++ net/core/dev.h | 18 +++++++++++++++ 5 files changed, 80 insertions(+), 12 deletions(-) create mode 100644 include/linux/netdevice_xmit.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c83b390191d47..f6fc9066147d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -43,6 +43,7 @@ #include #include +#include #include #include #include @@ -3223,13 +3224,7 @@ struct softnet_data { struct sk_buff_head xfrm_backlog; #endif /* written and read only by owning cpu: */ - struct { - u16 recursion; - u8 more; -#ifdef CONFIG_NET_EGRESS - u8 skip_txqueue; -#endif - } xmit; + struct netdev_xmit xmit; #ifdef CONFIG_RPS /* input_queue_head should be written by cpu owning this struct, * and only read by other cpus. Worth using a cache line. @@ -3257,10 +3252,18 @@ struct softnet_data { DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +#ifndef CONFIG_PREEMPT_RT static inline int dev_recursion_level(void) { return this_cpu_read(softnet_data.xmit.recursion); } +#else +static inline int dev_recursion_level(void) +{ + return current->net_xmit.recursion; +} + +#endif void __netif_schedule(struct Qdisc *q); void netif_schedule_queue(struct netdev_queue *txq); @@ -4872,18 +4875,35 @@ static inline ktime_t netdev_get_tstamp(struct net_device *dev, return hwtstamps->hwtstamp; } -static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, - struct sk_buff *skb, struct net_device *dev, - bool more) +#ifndef CONFIG_PREEMPT_RT +static inline void netdev_xmit_set_more(bool more) { __this_cpu_write(softnet_data.xmit.more, more); - return ops->ndo_start_xmit(skb, dev); } static inline bool netdev_xmit_more(void) { return __this_cpu_read(softnet_data.xmit.more); } +#else +static inline void netdev_xmit_set_more(bool more) +{ + current->net_xmit.more = more; +} + +static inline bool netdev_xmit_more(void) +{ + return current->net_xmit.more; +} +#endif + +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, + struct sk_buff *skb, struct net_device *dev, + bool more) +{ + netdev_xmit_set_more(more); + return ops->ndo_start_xmit(skb, dev); +} static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, bool more) diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h new file mode 100644 index 0000000000000..38325e0702968 --- /dev/null +++ b/include/linux/netdevice_xmit.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_NETDEVICE_XMIT_H +#define _LINUX_NETDEVICE_XMIT_H + +struct netdev_xmit { + u16 recursion; + u8 more; +#ifdef CONFIG_NET_EGRESS + u8 skip_txqueue; +#endif +}; + +#endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6d..5187486c25222 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -975,7 +976,9 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif - +#ifdef CONFIG_PREEMPT_RT + struct netdev_xmit net_xmit; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ struct restart_block restart_block; diff --git a/net/core/dev.c b/net/core/dev.c index 093d82bf0e288..95b9e4cc17676 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3940,6 +3940,7 @@ netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb) return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm)); } +#ifndef CONFIG_PREEMPT_RT static bool netdev_xmit_txqueue_skipped(void) { return __this_cpu_read(softnet_data.xmit.skip_txqueue); @@ -3950,6 +3951,19 @@ void netdev_xmit_skip_txqueue(bool skip) __this_cpu_write(softnet_data.xmit.skip_txqueue, skip); } EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); + +#else +static bool netdev_xmit_txqueue_skipped(void) +{ + return current->net_xmit.skip_txqueue; +} + +void netdev_xmit_skip_txqueue(bool skip) +{ + current->net_xmit.skip_txqueue = skip; +} +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); +#endif #endif /* CONFIG_NET_EGRESS */ #ifdef CONFIG_NET_XGRESS diff --git a/net/core/dev.h b/net/core/dev.h index 58f88d28bc994..5654325c5b710 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -150,6 +150,8 @@ struct napi_struct *napi_by_id(unsigned int napi_id); void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu); #define XMIT_RECURSION_LIMIT 8 + +#ifndef CONFIG_PREEMPT_RT static inline bool dev_xmit_recursion(void) { return unlikely(__this_cpu_read(softnet_data.xmit.recursion) > @@ -165,6 +167,22 @@ static inline void dev_xmit_recursion_dec(void) { __this_cpu_dec(softnet_data.xmit.recursion); } +#else +static inline bool dev_xmit_recursion(void) +{ + return unlikely(current->net_xmit.recursion > XMIT_RECURSION_LIMIT); +} + +static inline void dev_xmit_recursion_inc(void) +{ + current->net_xmit.recursion++; +} + +static inline void dev_xmit_recursion_dec(void) +{ + current->net_xmit.recursion--; +} +#endif int dev_set_hwtstamp_phylib(struct net_device *dev, struct kernel_hwtstamp_config *cfg, From patchwork Thu Jun 20 13:21:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705506 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AAEF1B1408; Thu, 20 Jun 2024 13:27:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; cv=none; b=F89eylRBnJ6DabwSo6xiHMf1NRwaNjo7G8OrqY38g+tNWIpY5hFqn2exrmbw1QsRJL/5BqrEvfBcLBQV4KrZFkzs1NX46AFCjv1Bvhi8XAyBi2DSOhkok9tIGWAH8TMaQxGH3Otu+ZUrRs/OWSBVKP183kgqXQXRchm8VuzUIDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890057; c=relaxed/simple; bh=O2tI7pfDIUiUwjHbETV4/u0vZ9+INFp6Dgn3XS7NXNU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sEIb2Ahhm4lPlnrh38VNjpNQkPjxZ5yjgaPmFPwMMe2gj/iy1SRwt7NYkXM5o7H6iUCNiCnQyB16rq2sdTwt9A3Q/f4d00tSfKDShnbBkapGHPj5sOvyvLASQXtf0SDFKqKeefBgYaoimpv5XFdXxCuOPyn3Co8uiAajdgWIUas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=X2hjGknx; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=6BbUkkr3; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="X2hjGknx"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="6BbUkkr3" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qmZ697yP2U6PaDAL2e1HtmX8k+6jlbTJ0LVJ//UsoYk=; b=X2hjGknxZnmx9eSJ4tlEFkckXkzotPTmV7zn49x/IVyvozJCnRMbqHQg99gE16LO1WI0Tt GCSEoEmwlmJ/9+0k598Qnvb/GPxoUiurjPSBPvZAFUGvTpYQuBBr9TIJ6BXEYYrTz0tEcw 0j2LXGp0ERI80KFXNGSD8PLvI94IwIx6U4Y2L3VKO6Z/soOo0w8y6pEpd9kV7qpBv/avD0 o5nVezuJfE/erosff7D/Gz0u6V22NPYqvdav4KKSZKwnZts1fOS5VxHmOZvqwB0tpJ5/wf oJ/LGNJrMXaphs3+hw+WYuf5JuofE82AK9FpDyBbMdcC1wuGmop3SbGBPUqwsA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qmZ697yP2U6PaDAL2e1HtmX8k+6jlbTJ0LVJ//UsoYk=; b=6BbUkkr3vd8U5MNHlQMHFFFjDSTyv6ggTkWch3D5s6xQmmYlYhXVDQyI2gEOtLgpiholRY rQkY+Kr8d6dON4Ag== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 09/15] dev: Remove PREEMPT_RT ifdefs from backlog_lock.*(). Date: Thu, 20 Jun 2024 15:21:59 +0200 Message-ID: <20240620132727.660738-10-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The backlog_napi locking (previously RPS) relies on explicit locking if either RPS or backlog NAPI is enabled. If both are disabled then locking was achieved by disabling interrupts except on PREEMPT_RT. PREEMPT_RT was excluded because the needed synchronisation was already provided local_bh_disable(). Since the introduction of backlog NAPI and making it mandatory for PREEMPT_RT the ifdef within backlog_lock.*() is obsolete and can be removed. Remove the ifdefs in backlog_lock.*(). Signed-off-by: Sebastian Andrzej Siewior --- net/core/dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 95b9e4cc17676..73c4d14e4febc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -229,7 +229,7 @@ static inline void backlog_lock_irq_save(struct softnet_data *sd, { if (IS_ENABLED(CONFIG_RPS) || use_backlog_threads()) spin_lock_irqsave(&sd->input_pkt_queue.lock, *flags); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_save(*flags); } @@ -237,7 +237,7 @@ static inline void backlog_lock_irq_disable(struct softnet_data *sd) { if (IS_ENABLED(CONFIG_RPS) || use_backlog_threads()) spin_lock_irq(&sd->input_pkt_queue.lock); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_disable(); } @@ -246,7 +246,7 @@ static inline void backlog_unlock_irq_restore(struct softnet_data *sd, { if (IS_ENABLED(CONFIG_RPS) || use_backlog_threads()) spin_unlock_irqrestore(&sd->input_pkt_queue.lock, *flags); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_restore(*flags); } @@ -254,7 +254,7 @@ static inline void backlog_unlock_irq_enable(struct softnet_data *sd) { if (IS_ENABLED(CONFIG_RPS) || use_backlog_threads()) spin_unlock_irq(&sd->input_pkt_queue.lock); - else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + else local_irq_enable(); } From patchwork Thu Jun 20 13:22:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705509 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 258DC1B142C; Thu, 20 Jun 2024 13:27:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890058; cv=none; b=RCzpK486UDy2hbwB4lzDTKlFher0u31NkMQcYr4yFfh9bHju7lTy+C3VvJMGKR+r424oisoXr2vyG3MeH0yMOThWL0k2vdPQsfPTB64OgzyagJXGY856KVvOvP2hOufqaKm6HWvTaKbTVVSFyQdNJ/Ic3FJPxJm/QitT/6C0h/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890058; c=relaxed/simple; bh=ypmHT1v1HtNFinDv3RL6id683Q4z0/bNnAHhkIQxSsE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pCifHD/LAspnPwtIiOsTqcbuqUtW3UZxW6PkF0tEP4Jm6mH2jhGqVt57a87C9vqMgKTxAPINJlcMgDUbZFipd4yzDxhtkoutOeGt7q4kHD3TkF/qPCNJU2TQrQlgRdoGMVtil0XJAlFN/yX3A/ArZX6dGFm5YPuuZndVlQ1D0wE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=TYJJ6bB2; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=AvkWNFlA; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TYJJ6bB2"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="AvkWNFlA" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oKb1Ii7fD2S+IizyTvyBdHKp+XOk+wkvfTYzAwWaVkc=; b=TYJJ6bB2yXQbXe/jy2zeqDd5zZDjhLviz/ssgWRnOs9YV6Ois0KBDYp2d4GCAN256wJgEH aCo1gbfSFhg+1++ks0hodQQqK+S+aee2lZgJ/HlUaqm711+eOAKomx3uJEsXVX687vvTmF wQe4hhdJgAz3DlBJJ8AMhIyndds2/2akZY4YQ5/XUiLEdANMzsPYp1ROczvWmFJySen8Nf uG+YRaGOqy+wqLnxy2pZXHIQZwNZwBKPlKlBzP1kmbOKO4YyBzC9FdG2ieXlWylCkEUWHL nnXqG8fUyXxDksOaTFJj2cqNuexJVdVNYNxz7BweQtjZOTnwRU0+jfBilL0XPA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oKb1Ii7fD2S+IizyTvyBdHKp+XOk+wkvfTYzAwWaVkc=; b=AvkWNFlATQ2AhOFhAhighnAwbp6dO/kIqCsGqivS66tXnCo31175hKG+8yXpp1aYsQkGA1 T1mDG5te1mv2NgBQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v9 net-next 10/15] dev: Use nested-BH locking for softnet_data.process_queue. Date: Thu, 20 Jun 2024 15:22:00 +0200 Message-ID: <20240620132727.660738-11-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org softnet_data::process_queue is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. softnet_data::input_queue_head can be updated lockless. This is fine because this value is only update CPU local by the local backlog_napi thread. Add a local_lock_t to softnet_data and use local_lock_nested_bh() for locking of process_queue. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/netdevice.h | 1 + net/core/dev.c | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f6fc9066147d2..4e81660b4462d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3202,6 +3202,7 @@ static inline bool dev_has_header(const struct net_device *dev) struct softnet_data { struct list_head poll_list; struct sk_buff_head process_queue; + local_lock_t process_queue_bh_lock; /* stats */ unsigned int processed; diff --git a/net/core/dev.c b/net/core/dev.c index 73c4d14e4febc..8ef727c2ae2be 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -449,7 +449,9 @@ static RAW_NOTIFIER_HEAD(netdev_chain); * queue in the local softnet handler. */ -DEFINE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +DEFINE_PER_CPU_ALIGNED(struct softnet_data, softnet_data) = { + .process_queue_bh_lock = INIT_LOCAL_LOCK(process_queue_bh_lock), +}; EXPORT_PER_CPU_SYMBOL(softnet_data); /* Page_pool has a lockless array/stack to alloc/recycle pages. @@ -5949,6 +5951,7 @@ static void flush_backlog(struct work_struct *work) } backlog_unlock_irq_enable(sd); + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); skb_queue_walk_safe(&sd->process_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->process_queue); @@ -5956,6 +5959,7 @@ static void flush_backlog(struct work_struct *work) rps_input_queue_head_incr(sd); } } + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); local_bh_enable(); } @@ -6077,7 +6081,9 @@ static int process_backlog(struct napi_struct *napi, int quota) while (again) { struct sk_buff *skb; + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); while ((skb = __skb_dequeue(&sd->process_queue))) { + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); rcu_read_lock(); __netif_receive_skb(skb); rcu_read_unlock(); @@ -6086,7 +6092,9 @@ static int process_backlog(struct napi_struct *napi, int quota) return work; } + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); } + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); backlog_lock_irq_disable(sd); if (skb_queue_empty(&sd->input_pkt_queue)) { @@ -6101,8 +6109,10 @@ static int process_backlog(struct napi_struct *napi, int quota) napi->state &= NAPIF_STATE_THREADED; again = false; } else { + local_lock_nested_bh(&softnet_data.process_queue_bh_lock); skb_queue_splice_tail_init(&sd->input_pkt_queue, &sd->process_queue); + local_unlock_nested_bh(&softnet_data.process_queue_bh_lock); } backlog_unlock_irq_enable(sd); } From patchwork Thu Jun 20 13:22:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705494 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 786D21B14E6; Thu, 20 Jun 2024 13:27:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890058; cv=none; b=EEps0fMjjFq4pmMXkyOaaWEkc+op96EnoJJHFZiJ+P6Zq9FtaBGE2oApUAET3yJSyZnY1y4O5ESTZCWhIIZEXeRppUcnLYIAewpV22/P7VkFM3Maqb+gVbcQQiYOhGMPe+WioiajFQ/Z64eHE8r4hQIENkcgHaNJunI4rvml3Fg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890058; c=relaxed/simple; bh=D7MV1ZUt0ja+VOQ5jBTqWT/rFam6BZA9NpCFJ3oOycU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gsPFeBJNTjyzPJmd6roJW1wAKw4rSOlsaVb9IoilTBnN8MAvIY9Ay5PdhxtR1x5gDjU/qwaiXPJtFu3gvto/11LyQ3diXv/MCwgjqE5L1vXdmRnbyEsuulbroOcCh5fEiZ0/LBche4DdWr4xLa8RV79HX09tL52uOyVW+uiHJSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=FoLqBBdU; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=EdsIP3ML; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="FoLqBBdU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EdsIP3ML" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eu8VD8DpLJkf2Hl3sxqZhIhEWbSf2IpwfXlNNAn3MjU=; b=FoLqBBdU5f4B9uTYEDnz0oq8uIBcmyUo5YX+L9SONwEsoV9oWtzDpFwBlGJKBLnHWwldF8 YZGFyNCD48GXXP/CVNFsl/oGXTVncFOzR1vhIVX5luQkELHnSzt8BaJL9Yz30AIb+A6zzQ CCu0SI6/fXD2UaPVv+u8IlmyeZUwFFnFuLv6qam/Vs2uWCmgjec5IH526TQqBoZfeHMJw+ u61uLuwdmYhMCOagkNfeMPnj9HSuA8Ek5Mt5CA9QMnKWiuWpZ6Sokx0evONPNqBzmXNjB4 tKlrwy0PxPkVp7nIoea+3k2MLB+/ao11MpnRbUnjUei+rPGiT/rABjqQmNgCjQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eu8VD8DpLJkf2Hl3sxqZhIhEWbSf2IpwfXlNNAn3MjU=; b=EdsIP3MLTmAVQ6a3AMqd3x9KvxqabvWih23pBfOMCuBdFHFunADFLSZKfaJ9VkxcSIVfFo Fkt+NK5mgEs36mCw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , bpf@vger.kernel.org Subject: [PATCH v9 net-next 11/15] lwt: Don't disable migration prio invoking BPF. Date: Thu, 20 Jun 2024 15:22:01 +0200 Message-ID: <20240620132727.660738-12-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org There is no need to explicitly disable migration if bottom halves are also disabled. Disabling BH implies disabling migration. Remove migrate_disable() and rely solely on disabling BH to remain on the same CPU. Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/core/lwt_bpf.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 4a0797f0a154b..a94943681e5aa 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -40,10 +40,9 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, { int ret; - /* Migration disable and BH disable are needed to protect per-cpu - * redirect_info between BPF prog and skb_do_redirect(). + /* Disabling BH is needed to protect per-CPU bpf_redirect_info between + * BPF prog and skb_do_redirect(). */ - migrate_disable(); local_bh_disable(); bpf_compute_data_pointers(skb); ret = bpf_prog_run_save_cb(lwt->prog, skb); @@ -78,7 +77,6 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, } local_bh_enable(); - migrate_enable(); return ret; } From patchwork Thu Jun 20 13:22:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705495 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 348A51B150D; Thu, 20 Jun 2024 13:27:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890059; cv=none; b=UFTlPo+gVFm7dIhb800yIAuaJAFghaRcTT5QmKYa/MosMTQW0khhyvDkGH6eUkeqicgB9Nn7GiwWx6UZWSedk0PNS/Rl0gzpQUfMzDYejAa93FvUW9svsybkl59Ltx+zBAyqtVfNOoLAGOzfTwtRDcfB39TwqEgo8iQf2HqBRC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890059; c=relaxed/simple; bh=FbbAiov4OHSV7INKnGdeocZuk+klys2tkIDh/H6h8/A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mkiVbkDE7gdCuFWdGJgzaPTnM9FK39w0t/FjnlIAem56cQpO2Hhizo1C5U8YAhcWKxd5BZRfHlO25vQYaQjIo6Y8z8fiAV7gYo6KtUW/hitJ84xWxj36p8ZeLWc8MMkLiuKZ9rdZ+i9nm5MDteicmXueYVeVakPr02Givfc+hNc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=eGwCLRfe; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=kYKulGVl; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eGwCLRfe"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kYKulGVl" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xg1ApPCQGVJ8CFcoKgcEnIVTe/xamtVzkM/qsLKXD5I=; b=eGwCLRfexfwVKGxWDTzXRH8wAdFxuxQ/h1VrmZH5MhSlc6hOPvzDSbZuS+18vAD+YYrahJ QT/xZDUSXgRjoIBwtWY3WEO9Aa5ZaECenmltKt1P8e4f4R8pu2OL304c/hjEB1KmcQjIuk cQoBqXG37VFcuv5WQdbbw5ahg4dbmQDJoNUklLM1S4bnElxPcdLUSV3JsvgAKlNPcK8qOY 8Kp89qRW3AR4DfxlsMeZrBCSDIU3vPqAT0KUedrfcJlVieyYTWRROCGr20A6jlzSm6Ql68 651bbagmO3zSCiE+njLC32DaHhCStbHkviHi3CudY7J4/H6ApXWLTMv9dL4g0w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xg1ApPCQGVJ8CFcoKgcEnIVTe/xamtVzkM/qsLKXD5I=; b=kYKulGVlx4KNm/1UugWA0UyH9zJJ42qcEUSpUQ0rc4jDTnAjoiDyPqQziF44s66Dl9/boS 4hXgsYHWDElnkNCg== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , David Ahern , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH v9 net-next 12/15] seg6: Use nested-BH locking for seg6_bpf_srh_states. Date: Thu, 20 Jun 2024 15:22:02 +0200 Message-ID: <20240620132727.660738-13-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: David Ahern Cc: Hao Luo Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- include/net/seg6_local.h | 1 + net/core/filter.c | 3 +++ net/ipv6/seg6_local.c | 22 ++++++++++++++-------- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h index 3fab9dec2ec45..888c1ce6f5272 100644 --- a/include/net/seg6_local.h +++ b/include/net/seg6_local.h @@ -19,6 +19,7 @@ extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr, extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb); struct seg6_bpf_srh_state { + local_lock_t bh_lock; struct ipv6_sr_hdr *srh; u16 hdrlen; bool valid; diff --git a/net/core/filter.c b/net/core/filter.c index 7c46ecba3b01b..ba1a739a9bedc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6450,6 +6450,7 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset, void *srh_tlvs, *srh_end, *ptr; int srhoff = 0; + lockdep_assert_held(&srh_state->bh_lock); if (srh == NULL) return -EINVAL; @@ -6506,6 +6507,7 @@ BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb, int hdroff = 0; int err; + lockdep_assert_held(&srh_state->bh_lock); switch (action) { case SEG6_LOCAL_ACTION_END_X: if (!seg6_bpf_has_valid_srh(skb)) @@ -6582,6 +6584,7 @@ BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset, int srhoff = 0; int ret; + lockdep_assert_held(&srh_state->bh_lock); if (unlikely(srh == NULL)) return -EINVAL; diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 24e2b4b494cb0..c4828c6620f07 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -1380,7 +1380,9 @@ static int input_action_end_b6_encap(struct sk_buff *skb, return err; } -DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states); +DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; bool seg6_bpf_has_valid_srh(struct sk_buff *skb) { @@ -1388,6 +1390,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) this_cpu_ptr(&seg6_bpf_srh_states); struct ipv6_sr_hdr *srh = srh_state->srh; + lockdep_assert_held(&srh_state->bh_lock); if (unlikely(srh == NULL)) return false; @@ -1408,8 +1411,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) static int input_action_end_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt) { - struct seg6_bpf_srh_state *srh_state = - this_cpu_ptr(&seg6_bpf_srh_states); + struct seg6_bpf_srh_state *srh_state; struct ipv6_sr_hdr *srh; int ret; @@ -1420,10 +1422,14 @@ static int input_action_end_bpf(struct sk_buff *skb, } advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - /* preempt_disable is needed to protect the per-CPU buffer srh_state, - * which is also accessed by the bpf_lwt_seg6_* helpers + /* The access to the per-CPU buffer srh_state is protected by running + * always in softirq context (with disabled BH). On PREEMPT_RT the + * required locking is provided by the following local_lock_nested_bh() + * statement. It is also accessed by the bpf_lwt_seg6_* helpers via + * bpf_prog_run_save_cb(). */ - preempt_disable(); + local_lock_nested_bh(&seg6_bpf_srh_states.bh_lock); + srh_state = this_cpu_ptr(&seg6_bpf_srh_states); srh_state->srh = srh; srh_state->hdrlen = srh->hdrlen << 3; srh_state->valid = true; @@ -1446,15 +1452,15 @@ static int input_action_end_bpf(struct sk_buff *skb, if (srh_state->srh && !seg6_bpf_has_valid_srh(skb)) goto drop; + local_unlock_nested_bh(&seg6_bpf_srh_states.bh_lock); - preempt_enable(); if (ret != BPF_REDIRECT) seg6_lookup_nexthop(skb, NULL, 0); return dst_input(skb); drop: - preempt_enable(); + local_unlock_nested_bh(&seg6_bpf_srh_states.bh_lock); kfree_skb(skb); return -EINVAL; } From patchwork Thu Jun 20 13:22:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705496 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A9521AE0A5; Thu, 20 Jun 2024 13:27:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890059; cv=none; b=Hr74wCoyS815PRVgou3FBJ2sBTei9DtZC8b0OKGkrqvYD9OYCzeM7qA1zB/OIgIkiqBKr1UBoyxQFmrHBM3zIs/gNSIalzIr8JJ/5XbYPmvtVX8yU23lPOEX6OerWXfAUBnv/aZHD4yaeuAFDhFzLSnz+w5/UJinEfNDH72jW4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890059; c=relaxed/simple; bh=mmb/n5Zpp12YZ/miCg0qq3L4AeVcYXceQP9aeoru8uM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XcaWk0KDztYUvbGz6qvRKGmxuRdXCjziQvD7VRtycPNMmn5hF5hK0s37ZblabKCeNFvlOXrHajrvmVQ9ll6d521H8CBYVksalg31ECSp7Q7ulmFCX0Q1gf8ZQP7VMmzL5xevqpr3zaE/xDVmz+EWbL54UHI0hER2CSGfFp/ynF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=DNz4OG6a; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=3ymlR7JR; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="DNz4OG6a"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3ymlR7JR" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2sNhpYc6Zhv5uHBsgsP2eGCPDInHSPpYVSlYr9BEPkY=; b=DNz4OG6aRuZjCTCGrx+cbR/r+tImEPpe4wq84hqsT1Y5E7U4jyVNwQue8nqtXxR4m08PnC ySg3bCMgXym2zn8TuBrocZLyRUgjOKcUUri8bW4u4IvEWZrkeu2VxSNBNGluNc/rPAe2jj CwORC7Mos6CUGv90/s2sI+MJjQcwah/w6IZiYAlsA0lggj/4zBxHLjeIAgGcj3ezJ/sRUO X10AEGRH5LjRhDN2oYA1U68KrYJucBVDLlydyUBQwClxHPLcPuofe7Xe2zI8hALw0TIcUS fuGDTgaDRBP4zegocXKnPACGpTQdL/GCKQKx7NqQO7t8hXtApootCQoGvt22ig== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2sNhpYc6Zhv5uHBsgsP2eGCPDInHSPpYVSlYr9BEPkY=; b=3ymlR7JRXyXarzkCfFWMcRb+f4kvlRhEK3jJw8bwCWQ4TYQvMS5CcX3rtiz14Nht6YPfkj aaTCUgSedkJx0xDw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH v9 net-next 13/15] net: Use nested-BH locking for bpf_scratchpad. Date: Thu, 20 Jun 2024 15:22:03 +0200 Message-ID: <20240620132727.660738-14-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org bpf_scratchpad is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Hao Luo Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Yonghong Song Cc: bpf@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- net/core/filter.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index ba1a739a9bedc..fbcfd563dccfd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1658,9 +1658,12 @@ struct bpf_scratchpad { __be32 diff[MAX_BPF_STACK / sizeof(__be32)]; u8 buff[MAX_BPF_STACK]; }; + local_lock_t bh_lock; }; -static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); +static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp) = { + .bh_lock = INIT_LOCAL_LOCK(bh_lock), +}; static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) @@ -2016,6 +2019,7 @@ BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size, struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp); u32 diff_size = from_size + to_size; int i, j = 0; + __wsum ret; /* This is quite flexible, some examples: * @@ -2029,12 +2033,15 @@ BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size, diff_size > sizeof(sp->diff))) return -EINVAL; + local_lock_nested_bh(&bpf_sp.bh_lock); for (i = 0; i < from_size / sizeof(__be32); i++, j++) sp->diff[j] = ~from[i]; for (i = 0; i < to_size / sizeof(__be32); i++, j++) sp->diff[j] = to[i]; - return csum_partial(sp->diff, diff_size, seed); + ret = csum_partial(sp->diff, diff_size, seed); + local_unlock_nested_bh(&bpf_sp.bh_lock); + return ret; } static const struct bpf_func_proto bpf_csum_diff_proto = { From patchwork Thu Jun 20 13:22:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705498 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3689B1B29DA; Thu, 20 Jun 2024 13:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890061; cv=none; b=mU3JlSoHQGRRz0HT9USzZ/y8fgEA+6S+OeK/OOghm2qU0RJqOZcAIFzY5v8DlCi+wPqQJUEItBz91zy2UrYCw5GKesQSqMRVeE2eMTJ3r8meRkUqNU12gqhIWVeWNfruq8oe4I+x8cZ35JHzf/69Z5LfF+Ypv/hRJ22O2BIOgd8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890061; c=relaxed/simple; bh=ot686HczP0opgwipPMLbV/tVqan6lGBqdpSuM0wiDpM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VwosIzd50GucJQL6FIhuCZEV/WAf5XghCqY4bSyOF4imPrWR/if96EfOJogBpxGRNxjG5K18pirzmJ+IO+K492x+ekOHAJ+E2IL7X5dTBUOLhaCn3BRoW2thMHmoByWZ1dluPwn4B/oz+qw5+84DFfZuQJ4kWCy4RKaDIjghqJs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=iUQBlB7p; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Xn0AiSHO; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="iUQBlB7p"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Xn0AiSHO" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CckJGNOZClrvzvHr2x/F29MTMf3nmK0u9Df4knF+nno=; b=iUQBlB7pvhvZpj5TCAPcxU0e6aq1mAPV4r8X4bl/bpOx7OM6EkuxUA39EssnNdpcehqhsM uDgICmfEZZcVuJuITIs6J5br4sr0WGXWnDfECiKhcLlQBZutuIZ44/l3zt+Vw/4yLv43zo YQQtYoySoTfWvjuOa0BCuXZSwP0ncoNrDnPtczHvPnzMXo4Z2WgV9BwkQl0QKmpuM1uRPx e5IdTmx96CfusIgs6RcF/VkASda9rTpOjuY1e3NVJHAd4LBg9muV+NM4tJp++ITms/zB7b ncalaB8DhsGZwdZ1LJU/qr3kx0YcQk5aN8WLSy2X4k3zxFGRJQRH8jCOLENKfA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CckJGNOZClrvzvHr2x/F29MTMf3nmK0u9Df4knF+nno=; b=Xn0AiSHO2rjpLPI1GEEC+TXpoyaR5YJ4yrUYXPGkygitI5TKdFMgm846zZpO1xHPhydgSr Z92Cx2WR+J5qQjCw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rge?= =?utf-8?q?nsen?= , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH v9 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. Date: Thu, 20 Jun 2024 15:22:04 +0200 Message-ID: <20240620132727.660738-15-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset. The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Eduard Zingerman Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Toke Høiland-Jørgensen Cc: Yonghong Song Cc: bpf@vger.kernel.org Acked-by: Alexei Starovoitov Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: Sebastian Andrzej Siewior --- include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++-------- include/linux/sched.h | 3 +++ kernel/bpf/cpumap.c | 3 +++ kernel/bpf/devmap.c | 9 ++++++- kernel/fork.c | 1 + net/bpf/test_run.c | 11 ++++++++- net/core/dev.c | 29 +++++++++++++++++++++- net/core/filter.c | 44 +++++++++------------------------ net/core/lwt_bpf.c | 3 +++ 9 files changed, 114 insertions(+), 45 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index b02aea291b7e8..0a7f6e4a00b60 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -733,21 +733,59 @@ struct bpf_nh_params { }; }; +/* flags for bpf_redirect_info kern_flags */ +#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ +#define BPF_RI_F_RI_INIT BIT(1) + struct bpf_redirect_info { u64 tgt_index; void *tgt_value; struct bpf_map *map; u32 flags; - u32 kern_flags; u32 map_id; enum bpf_map_type map_type; struct bpf_nh_params nh; + u32 kern_flags; }; -DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); +struct bpf_net_context { + struct bpf_redirect_info ri; +}; -/* flags for bpf_redirect_info kern_flags */ -#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ +static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx) +{ + struct task_struct *tsk = current; + + if (tsk->bpf_net_context != NULL) + return NULL; + bpf_net_ctx->ri.kern_flags = 0; + + tsk->bpf_net_context = bpf_net_ctx; + return bpf_net_ctx; +} + +static inline void bpf_net_ctx_clear(struct bpf_net_context *bpf_net_ctx) +{ + if (bpf_net_ctx) + current->bpf_net_context = NULL; +} + +static inline struct bpf_net_context *bpf_net_ctx_get(void) +{ + return current->bpf_net_context; +} + +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) +{ + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_RI_INIT)) { + memset(&bpf_net_ctx->ri, 0, offsetof(struct bpf_net_context, ri.nh)); + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_RI_INIT; + } + + return &bpf_net_ctx->ri; +} /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, @@ -1018,25 +1056,23 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt); -void bpf_clear_redirect_map(struct bpf_map *map); - static inline bool xdp_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT; } static inline void xdp_set_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT; } static inline void xdp_clear_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; } @@ -1592,7 +1628,7 @@ static __always_inline long __bpf_xdp_redirect_map(struct bpf_map *map, u64 inde u64 flags, const u64 flag_mask, void *lookup_elem(struct bpf_map *map, u32 key)) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); const u64 action_mask = XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX; /* Lower bits of the flags are used as return code on lookup failure */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 5187486c25222..5ff5e65a46277 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -54,6 +54,7 @@ struct bio_list; struct blk_plug; struct bpf_local_storage; struct bpf_run_ctx; +struct bpf_net_context; struct capture_control; struct cfs_rq; struct fs_struct; @@ -1509,6 +1510,8 @@ struct task_struct { /* Used for BPF run context */ struct bpf_run_ctx *bpf_ctx; #endif + /* Used by BPF for per-TASK xdp storage */ + struct bpf_net_context *bpf_net_context; #ifdef CONFIG_GCC_PLUGIN_STACKLEAK unsigned long lowest_stack; diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index a8e34416e960f..66974bd027109 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -240,12 +240,14 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, int xdp_n, struct xdp_cpumap_stats *stats, struct list_head *list) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int nframes; if (!rcpu->prog) return xdp_n; rcu_read_lock_bh(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats); @@ -255,6 +257,7 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, if (unlikely(!list_empty(list))) cpu_map_bpf_prog_run_skb(rcpu, list, stats); + bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock_bh(); /* resched point, may call do_softirq() */ return nframes; diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 7f3b34452243c..fbfdfb60db8d7 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -196,7 +196,14 @@ static void dev_map_free(struct bpf_map *map) list_del_rcu(&dtab->list); spin_unlock(&dev_map_lock); - bpf_clear_redirect_map(map); + /* bpf_redirect_info->map is assigned in __bpf_xdp_redirect_map() + * during NAPI callback and cleared after the XDP redirect. There is no + * explicit RCU read section which protects bpf_redirect_info->map but + * local_bh_disable() also marks the beginning an RCU section. This + * makes the complete softirq callback RCU protected. Thus after + * following synchronize_rcu() there no bpf_redirect_info->map == map + * assignment. + */ synchronize_rcu(); /* Make sure prior __dev_map_entry_free() have completed. */ diff --git a/kernel/fork.c b/kernel/fork.c index 99076dbe27d83..f314bdd7e6108 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2355,6 +2355,7 @@ __latent_entropy struct task_struct *copy_process( RCU_INIT_POINTER(p->bpf_storage, NULL); p->bpf_ctx = NULL; #endif + p->bpf_net_context = NULL; /* Perform scheduler related setup. Assign this task to a CPU. */ retval = sched_fork(clone_flags, p); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 36ae54f57bf57..a6d7f790cdda8 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -283,9 +283,10 @@ static int xdp_recv_frames(struct xdp_frame **frames, int nframes, static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, u32 repeat) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int err = 0, act, ret, i, nframes = 0, batch_sz; struct xdp_frame **frames = xdp->frames; + struct bpf_redirect_info *ri; struct xdp_page_head *head; struct xdp_frame *frm; bool redirect = false; @@ -295,6 +296,8 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, batch_sz = min_t(u32, repeat, xdp->batch_size); local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + ri = bpf_net_ctx_get_ri(); xdp_set_return_frame_no_direct(); for (i = 0; i < batch_sz; i++) { @@ -359,6 +362,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, } xdp_clear_return_frame_no_direct(); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); return err; } @@ -394,6 +398,7 @@ static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; struct bpf_cg_run_ctx run_ctx; @@ -419,10 +424,14 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, do { run_ctx.prog_item = &item; local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + if (xdp) *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); + + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); diff --git a/net/core/dev.c b/net/core/dev.c index 8ef727c2ae2be..b94fb4e63a289 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4045,10 +4045,13 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, { struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress); enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS; + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int sch_ret; if (!entry) return skb; + + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (*pt_prev) { *ret = deliver_skb(skb, *pt_prev, orig_dev); *pt_prev = NULL; @@ -4077,10 +4080,12 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, break; } *ret = NET_RX_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; case TC_ACT_SHOT: kfree_skb_reason(skb, drop_reason); *ret = NET_RX_DROP; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; /* used by tc_run */ case TC_ACT_STOLEN: @@ -4090,8 +4095,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, fallthrough; case TC_ACT_CONSUMED: *ret = NET_RX_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; } + bpf_net_ctx_clear(bpf_net_ctx); return skb; } @@ -4101,11 +4108,14 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) { struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress); enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS; + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int sch_ret; if (!entry) return skb; + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was * already set by the caller. */ @@ -4121,10 +4131,12 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) /* No need to push/pop skb's mac_header here on egress! */ skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; case TC_ACT_SHOT: kfree_skb_reason(skb, drop_reason); *ret = NET_XMIT_DROP; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; /* used by tc_run */ case TC_ACT_STOLEN: @@ -4134,8 +4146,10 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) fallthrough; case TC_ACT_CONSUMED: *ret = NET_XMIT_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; } + bpf_net_ctx_clear(bpf_net_ctx); return skb; } @@ -6325,6 +6339,7 @@ enum { static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, unsigned flags, u16 budget) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; bool skip_schedule = false; unsigned long timeout; int rc; @@ -6342,6 +6357,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state); local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (flags & NAPI_F_PREFER_BUSY_POLL) { napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs); @@ -6364,6 +6380,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, netpoll_poll_unlock(have_poll_lock); if (rc == budget) __busy_poll_stop(napi, skip_schedule); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); } @@ -6373,6 +6390,7 @@ static void __napi_busy_loop(unsigned int napi_id, { unsigned long start_time = loop_end ? busy_loop_current_time() : 0; int (*napi_poll)(struct napi_struct *napi, int budget); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; void *have_poll_lock = NULL; struct napi_struct *napi; @@ -6391,6 +6409,7 @@ static void __napi_busy_loop(unsigned int napi_id, int work = 0; local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (!napi_poll) { unsigned long val = READ_ONCE(napi->state); @@ -6421,6 +6440,7 @@ static void __napi_busy_loop(unsigned int napi_id, __NET_ADD_STATS(dev_net(napi->dev), LINUX_MIB_BUSYPOLLRXPACKETS, work); skb_defer_free_flush(this_cpu_ptr(&softnet_data)); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); if (!loop_end || loop_end(loop_end_arg, start_time)) @@ -6848,6 +6868,7 @@ static int napi_thread_wait(struct napi_struct *napi) static void napi_threaded_poll_loop(struct napi_struct *napi) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct softnet_data *sd; unsigned long last_qs = jiffies; @@ -6856,6 +6877,8 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) void *have; local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + sd = this_cpu_ptr(&softnet_data); sd->in_napi_threaded_poll = true; @@ -6871,6 +6894,7 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) net_rps_action_and_irq_enable(sd); } skb_defer_free_flush(sd); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); if (!repoll) @@ -6896,10 +6920,12 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + usecs_to_jiffies(READ_ONCE(net_hotdata.netdev_budget_usecs)); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int budget = READ_ONCE(net_hotdata.netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); start: sd->in_net_rx_action = true; local_irq_disable(); @@ -6952,7 +6978,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) sd->in_net_rx_action = false; net_rps_action_and_irq_enable(sd); -end:; +end: + bpf_net_ctx_clear(bpf_net_ctx); } struct netdev_adjacent { diff --git a/net/core/filter.c b/net/core/filter.c index fbcfd563dccfd..f40b8393dd58f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2478,9 +2478,6 @@ static const struct bpf_func_proto bpf_clone_redirect_proto = { .arg3_type = ARG_ANYTHING, }; -DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); -EXPORT_PER_CPU_SYMBOL_GPL(bpf_redirect_info); - static struct net_device *skb_get_peer_dev(struct net_device *dev) { const struct net_device_ops *ops = dev->netdev_ops; @@ -2493,7 +2490,7 @@ static struct net_device *skb_get_peer_dev(struct net_device *dev) int skb_do_redirect(struct sk_buff *skb) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct net *net = dev_net(skb->dev); struct net_device *dev; u32 flags = ri->flags; @@ -2526,7 +2523,7 @@ int skb_do_redirect(struct sk_buff *skb) BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL))) return TC_ACT_SHOT; @@ -2547,7 +2544,7 @@ static const struct bpf_func_proto bpf_redirect_proto = { BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); if (unlikely(flags)) return TC_ACT_SHOT; @@ -2569,7 +2566,7 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = { BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params, int, plen, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); if (unlikely((plen && plen < sizeof(*params)) || flags)) return TC_ACT_SHOT; @@ -4295,30 +4292,13 @@ void xdp_do_check_flushed(struct napi_struct *napi) } #endif -void bpf_clear_redirect_map(struct bpf_map *map) -{ - struct bpf_redirect_info *ri; - int cpu; - - for_each_possible_cpu(cpu) { - ri = per_cpu_ptr(&bpf_redirect_info, cpu); - /* Avoid polluting remote cacheline due to writes if - * not needed. Once we pass this test, we need the - * cmpxchg() to make sure it hasn't been changed in - * the meantime by remote CPU. - */ - if (unlikely(READ_ONCE(ri->map) == map)) - cmpxchg(&ri->map, map, NULL); - } -} - DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key); u32 xdp_master_redirect(struct xdp_buff *xdp) { + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct net_device *master, *slave; - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); @@ -4390,7 +4370,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, map = READ_ONCE(ri->map); /* The map pointer is cleared when the map is being torn - * down by bpf_clear_redirect_map() + * down by dev_map_free() */ if (unlikely(!map)) { err = -ENOENT; @@ -4435,7 +4415,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type; if (map_type == BPF_MAP_TYPE_XSKMAP) @@ -4449,7 +4429,7 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect); int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, struct xdp_frame *xdpf, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type; if (map_type == BPF_MAP_TYPE_XSKMAP) @@ -4466,7 +4446,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, enum bpf_map_type map_type, u32 map_id, u32 flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct bpf_map *map; int err; @@ -4478,7 +4458,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, map = READ_ONCE(ri->map); /* The map pointer is cleared when the map is being torn - * down by bpf_clear_redirect_map() + * down by dev_map_free() */ if (unlikely(!map)) { err = -ENOENT; @@ -4520,7 +4500,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type; void *fwd = ri->tgt_value; u32 map_id = ri->map_id; @@ -4556,7 +4536,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); if (unlikely(flags)) return XDP_ABORTED; diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index a94943681e5aa..afb05f58b64c5 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -38,12 +38,14 @@ static inline struct bpf_lwt *bpf_lwt_lwtunnel(struct lwtunnel_state *lwt) static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, struct dst_entry *dst, bool can_redirect) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int ret; /* Disabling BH is needed to protect per-CPU bpf_redirect_info between * BPF prog and skb_do_redirect(). */ local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); bpf_compute_data_pointers(skb); ret = bpf_prog_run_save_cb(lwt->prog, skb); @@ -76,6 +78,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, break; } + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); return ret; From patchwork Thu Jun 20 13:22:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13705497 X-Patchwork-Delegate: kuba@kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CED031B3757; Thu, 20 Jun 2024 13:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890061; cv=none; b=PZCRpkrH9zlRhkjv2ubYmW0bxzxJCZWrbDQg0Z5PkR/8bKuN5+/UzjFrgzoWjIjhBUFkTAbaok8SIPKJYj1klO85eqiErBVKLy0KWs6ym91FXgu31iJvanneKbRL5XbDJqG+fUxogaAZmfMu82CxMbIz/ZySlw20IBAwD2mwQ2A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718890061; c=relaxed/simple; bh=pSOR2STlgJM22IKUYZ8cuI/l5Bq/SR7B2KwI5Yc4+tk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QqX8p6iB3UK6PYXXDGPp6gCgHI61fffOvAKw180D4fyCdZPMkg6cHiZsI/MV+xVQwKOjGVLwUN02a4vj6AKymbJJVCOmQ517citYC/ab4LdU8PaxHTDbFgEw7YASvuDtsrtEM5ljDWNs4mvPK3HNysT2Mm4Hv/JXKti2ebfaTMY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=m1IeG8yj; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=7MsoJ390; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="m1IeG8yj"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="7MsoJ390" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1718890057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G9SOuvMTp3HS5/ebmbO+IetpLXVdSsF8sZVQlF3sm84=; b=m1IeG8yjZGm5N6xF5rKE+abS4ZiHjkIy+YUpenD87y0whlKN2i/Obhv/9FEV4Bp8ENYUrW yD/18+AWVtawidB4UUnWpiPNGgTyrN4BbUTbAvCrvFbS6G2Apb+oYGXlMczLXAv/1sPsmZ C7fGGWRTowCu/P+Mv+UdN8WdI9HnZty3M3zZXU8sd2FhZL3mJdM61t2Y2sE0mlRT1h2Roe mi/CbUZPHYtFf48kCi4VIzR0Xbgd+eUAqZMs9631OIsVxCCHNXX7ivEPsGBpqEBiIHUs/E dLELBR20+RSg1wXfJR3r/TJJkptt4tyVapHCTqaBy/x6ThOTYhyXjAE0tpxkrQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1718890057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G9SOuvMTp3HS5/ebmbO+IetpLXVdSsF8sZVQlF3sm84=; b=7MsoJ3904ydunDrq/sPx+9zWobIZIW7s+c0kw/SVrFd0slT/4BWMn1wuhSEL6NnDfNVxGQ vXWBsu22KGIM0DAQ== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , Jonathan Lemon , KP Singh , Maciej Fijalkowski , Magnus Karlsson , Martin KaFai Lau , Song Liu , Stanislav Fomichev , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rge?= =?utf-8?q?nsen?= , Yonghong Song , bpf@vger.kernel.org Subject: [PATCH v9 net-next 15/15] net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT. Date: Thu, 20 Jun 2024 15:22:05 +0200 Message-ID: <20240620132727.660738-16-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The per-CPU flush lists, which are accessed from within the NAPI callback (xdp_do_flush() for instance), are per-CPU. There are subject to the same problem as struct bpf_redirect_info. Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the access. The lists initialized on first usage (similar to bpf_net_ctx_get_ri()). Cc: "Björn Töpel" Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Eduard Zingerman Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: Jonathan Lemon Cc: KP Singh Cc: Maciej Fijalkowski Cc: Magnus Karlsson Cc: Martin KaFai Lau Cc: Song Liu Cc: Stanislav Fomichev Cc: Toke Høiland-Jørgensen Cc: Yonghong Song Cc: bpf@vger.kernel.org Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: Sebastian Andrzej Siewior --- include/linux/filter.h | 42 ++++++++++++++++++++++++++++++++++++++++++ kernel/bpf/cpumap.c | 19 +++---------------- kernel/bpf/devmap.c | 11 +++-------- net/xdp/xsk.c | 12 ++++-------- 4 files changed, 52 insertions(+), 32 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 0a7f6e4a00b60..c0349522de8fb 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -736,6 +736,9 @@ struct bpf_nh_params { /* flags for bpf_redirect_info kern_flags */ #define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ #define BPF_RI_F_RI_INIT BIT(1) +#define BPF_RI_F_CPU_MAP_INIT BIT(2) +#define BPF_RI_F_DEV_MAP_INIT BIT(3) +#define BPF_RI_F_XSK_MAP_INIT BIT(4) struct bpf_redirect_info { u64 tgt_index; @@ -750,6 +753,9 @@ struct bpf_redirect_info { struct bpf_net_context { struct bpf_redirect_info ri; + struct list_head cpu_map_flush_list; + struct list_head dev_map_flush_list; + struct list_head xskmap_map_flush_list; }; static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx) @@ -787,6 +793,42 @@ static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) return &bpf_net_ctx->ri; } +static inline struct list_head *bpf_net_ctx_get_cpu_map_flush_list(void) +{ + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_CPU_MAP_INIT)) { + INIT_LIST_HEAD(&bpf_net_ctx->cpu_map_flush_list); + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_CPU_MAP_INIT; + } + + return &bpf_net_ctx->cpu_map_flush_list; +} + +static inline struct list_head *bpf_net_ctx_get_dev_flush_list(void) +{ + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_DEV_MAP_INIT)) { + INIT_LIST_HEAD(&bpf_net_ctx->dev_map_flush_list); + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_DEV_MAP_INIT; + } + + return &bpf_net_ctx->dev_map_flush_list; +} + +static inline struct list_head *bpf_net_ctx_get_xskmap_flush_list(void) +{ + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_XSK_MAP_INIT)) { + INIT_LIST_HEAD(&bpf_net_ctx->xskmap_map_flush_list); + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_XSK_MAP_INIT; + } + + return &bpf_net_ctx->xskmap_map_flush_list; +} + /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 66974bd027109..068e994ed781a 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -79,8 +79,6 @@ struct bpf_cpu_map { struct bpf_cpu_map_entry __rcu **cpu_map; }; -static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list); - static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) { u32 value_size = attr->value_size; @@ -709,7 +707,7 @@ static void bq_flush_to_queue(struct xdp_bulk_queue *bq) */ static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) { - struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_cpu_map_flush_list(); struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); if (unlikely(bq->count == CPU_MAP_BULK_SIZE)) @@ -761,7 +759,7 @@ int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu, void __cpu_map_flush(void) { - struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_cpu_map_flush_list(); struct xdp_bulk_queue *bq, *tmp; list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { @@ -775,20 +773,9 @@ void __cpu_map_flush(void) #ifdef CONFIG_DEBUG_NET bool cpu_map_check_flush(void) { - if (list_empty(this_cpu_ptr(&cpu_map_flush_list))) + if (list_empty(bpf_net_ctx_get_cpu_map_flush_list())) return false; __cpu_map_flush(); return true; } #endif - -static int __init cpu_map_init(void) -{ - int cpu; - - for_each_possible_cpu(cpu) - INIT_LIST_HEAD(&per_cpu(cpu_map_flush_list, cpu)); - return 0; -} - -subsys_initcall(cpu_map_init); diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index fbfdfb60db8d7..317ac2d66ebd1 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -83,7 +83,6 @@ struct bpf_dtab { u32 n_buckets; }; -static DEFINE_PER_CPU(struct list_head, dev_flush_list); static DEFINE_SPINLOCK(dev_map_lock); static LIST_HEAD(dev_map_list); @@ -415,7 +414,7 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) */ void __dev_flush(void) { - struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_dev_flush_list(); struct xdp_dev_bulk_queue *bq, *tmp; list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { @@ -429,7 +428,7 @@ void __dev_flush(void) #ifdef CONFIG_DEBUG_NET bool dev_check_flush(void) { - if (list_empty(this_cpu_ptr(&dev_flush_list))) + if (list_empty(bpf_net_ctx_get_dev_flush_list())) return false; __dev_flush(); return true; @@ -460,7 +459,7 @@ static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key) static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx, struct bpf_prog *xdp_prog) { - struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_dev_flush_list(); struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) @@ -1160,15 +1159,11 @@ static struct notifier_block dev_map_notifier = { static int __init dev_map_init(void) { - int cpu; - /* Assure tracepoint shadow struct _bpf_dtab_netdev is in sync */ BUILD_BUG_ON(offsetof(struct bpf_dtab_netdev, dev) != offsetof(struct _bpf_dtab_netdev, dev)); register_netdevice_notifier(&dev_map_notifier); - for_each_possible_cpu(cpu) - INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu)); return 0; } diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 7d1c0986f9bb3..ed062e0383896 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -35,8 +35,6 @@ #define TX_BATCH_SIZE 32 #define MAX_PER_SOCKET_BUDGET (TX_BATCH_SIZE) -static DEFINE_PER_CPU(struct list_head, xskmap_flush_list); - void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { if (pool->cached_need_wakeup & XDP_WAKEUP_RX) @@ -372,7 +370,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) { - struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_xskmap_flush_list(); int err; err = xsk_rcv(xs, xdp); @@ -387,7 +385,7 @@ int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) void __xsk_map_flush(void) { - struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list); + struct list_head *flush_list = bpf_net_ctx_get_xskmap_flush_list(); struct xdp_sock *xs, *tmp; list_for_each_entry_safe(xs, tmp, flush_list, flush_node) { @@ -399,7 +397,7 @@ void __xsk_map_flush(void) #ifdef CONFIG_DEBUG_NET bool xsk_map_check_flush(void) { - if (list_empty(this_cpu_ptr(&xskmap_flush_list))) + if (list_empty(bpf_net_ctx_get_xskmap_flush_list())) return false; __xsk_map_flush(); return true; @@ -1772,7 +1770,7 @@ static struct pernet_operations xsk_net_ops = { static int __init xsk_init(void) { - int err, cpu; + int err; err = proto_register(&xsk_proto, 0 /* no slab */); if (err) @@ -1790,8 +1788,6 @@ static int __init xsk_init(void) if (err) goto out_pernet; - for_each_possible_cpu(cpu) - INIT_LIST_HEAD(&per_cpu(xskmap_flush_list, cpu)); return 0; out_pernet: