From patchwork Tue Jan  7 13:59:43 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928937
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC5651F193B;
	Tue,  7 Jan 2025 14:00:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258417; cv=none;
 b=PaNy2PbPTsCErPVXpdupL0F2Esm2nEQ5mgEHH9nVn5g0MtF7JyjUcwjHYdHlprV7gCoh48f3OPe/u5ItVyel/0s31XXJYCaucC8KsVIU720DrrOa2jp60fKcl30n7iZajUIpuDTSIzLKemJ/dxXvKOchngUBU6Mz2SJD69ZERao=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258417; c=relaxed/simple;
	bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=una7oSC+q8j55hWIbZbT179Ab1j39cZEijihYB5cZwp5SYsqMvqi6cuLNoXltxx6lDB0IyUrBBp1dKtqpqatL0Ovo6fEvF9jvvjyAQE4WRqO2/iaX90yckzhR+LqeYf7TU3nMSrMAYFcUhAWTAlbwcOehfYwGd5Ad/0XjvbZev4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MRf5+fOq; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MRf5+fOq"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-436202dd7f6so176459225e9.0;
        Tue, 07 Jan 2025 06:00:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258412; x=1736863212;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=MRf5+fOqWiLalDHMtMkzcAREIaOQRS42kj4G2g61Zda5WXCtsU/Glnt30QndqauyRd
         Y+/Nzg1JV8TcqVvCkJ/2E/mZdw63+fA3oZeLIXnZfX3nunDWtp445okAG2n8kI1xmN+p
         x0YmREAcTB0iaoJvX0KvDGCrVr1aX04GxnaFZF41c5ybEz3DWPHN+99r47dPcDzxXVY4
         A6n+hcfL27Xe7Ew1CiUS3M39S80snkyiS2h5YnYZK0Lfs9OsFcyM/ayM4Wb0BpSfku8h
         h60NJ2k/jVLGqW+ysQGSpVGiYNr6pj1En7eyQ7SM6W0W0jWvKbzjlmwMoOD/idsh9ymW
         pYsQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258412; x=1736863212;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=dEbg0j2puqezYGCN73gQr+VcqqxlIINbcA/b2QmemVKjqOxFFwDK+rJocOrwlF9JCG
         +UGXLZNtHg0mm7NEfVOKoXh8FxjNBNNhi7OKsYajUDRJ9HrHbkHUOsMfZespnx7svuCr
         /PD3j7DNO43v6b/evRdYfC5IFfL6k3TUL27hkfIgOg5fo7oywlipLGnK9+qJfLGtEtn5
         FxkLPoKW4oVgyPG/f7bNrSLhZWmajbaOS93g8TJvJj+Joxx68lOVkgePCBHZGqJDpzU1
         Y4BIEXOO69IYG1rTsQM0vzSZTk6s+CbHLQoZCUMYEkTTJZjxg2devpno3QJfLiB1btlA
         mPcQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCVInmVuCNhtvPUd5Tnz3aqvYugc0jUyy1KybT3zah+V+5G8ztOok1FzHdAJ5ZSEcoB3wKkkenbbnizNLyA=@vger.kernel.org
X-Gm-Message-State: AOJu0YxDqnJYkfA8t0y1//UI7B0asWckxHFh79K021CglfQoNJX/knsM
	OOzltxXKZv5TulWPfQXowVcxJHxMgriAeJky8mzZ7Pl9/6CdgBV4Q3BiSP64ZQ18mg==
X-Gm-Gg: ASbGncuMOUhJEV9Y+ZE+SQ8aNHOcQYv5r6IaFzphV2JktOKnFvs550rM4aWVpQsJrE7
	8hnfo8wghwFmVRN0VZddkaUGa8WWAWXHOG++vy1PekkHU8jlFHRMbCsM02VNF6qmNo1a7fhGh+t
	y9/qHpqZcqmcedY3741yWvibgDv7fwDLPQpmrIJuiNnTqXZCfW1WyZJK7sokMeM4RZ8YwhNXz4o
	02jMlIr3lpf2u2eDx69pwgL6EyzGjmwSGISHzYGqfKmTQ==
X-Google-Smtp-Source: 
 AGHT+IFl6ncqFkiiYBSYvdX/kEA3xJTbqA7J10nz8/oecZCW13qQPf/5qfgnHJmzgPA86GcY0NPjMw==
X-Received: by 2002:a05:600c:154a:b0:434:fd15:3adc with SMTP id
 5b1f17b1804b1-43668b786bamr470644435e9.25.1736258412294;
        Tue, 07 Jan 2025 06:00:12 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:c::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4366e210cecsm563013435e9.2.2025.01.07.06.00.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:11 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to
 public header
Date: Tue,  7 Jan 2025 05:59:43 -0800
Message-ID: <20250107140004.2732830-2-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject;
 bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCcghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px
 YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8Ryi9PEA
 C8KWTI55qkfZK1+25430tCDz4Pfj7fNqIYyW6fCzd7K7eLFMYngqRj9hivT19MlwCEuUAOfbvIBZo8
 zkQms5EJOYkBwj7X7gtJp9dxzzySOoiFS3TluYhnN+tuvgPWylkrZGGKPa2bvyT8i66YCpJtT3NQ0B
 hz2iGjXWtcwS7CopSQK/4yF13za3ECl7fVVGG/fU5H6tTC42LcZIHZ2ZbzjAGW2VolovB+M3jGJCU+
 jsGelN+U+3DvNMAEQuPBKAmPuKkDtVa14JtO8KCYwyKeu6eVgHwS6Gl53Bh7BH+IbcUONrvUdYgjjr
 zAdEpgLZ4HpMmIkvxE3MqSXkB4527JrH+cS7ShKEitUo6AiybfOR3EHecJsKQzg6Sl9KOMZyzxQD9d
 6TruhF1KYSj9Kth/QfVaqXilOnPCrk1DGjA6H8LgY1F0PY/rxJMvvDyJdFN+mErsBpKjp2SYDIE8E1
 aaof92UGYgtpPfnpHh7etHzjcOIc07fqrDnJPE7Ejn8ihS9WF1lWsVtmzfbL+8iojeMdh5VxIomOvQ
 KXE7peVffA7paeB1e0hS72aWCgGp5vpDHo4e9zQ0RYW6+x0e7axLCWuROvvsMLWjn9IVTsl7Dtaxex
 Odifkj4y8CLYLgMfiLKrVR4UI8kxOfMvO7QjyxQfn+iMUjVU8bEsanOEwvAg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/mcs_spinlock.h | 6 ++++++
 kernel/locking/mcs_spinlock.h      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..39c94012b88a 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -1,6 +1,12 @@
 #ifndef __ASM_MCS_SPINLOCK_H
 #define __ASM_MCS_SPINLOCK_H
 
+struct mcs_spinlock {
+	struct mcs_spinlock *next;
+	int locked; /* 1 if lock acquired */
+	int count;  /* nesting count, see qspinlock.c */
+};
+
 /*
  * Architectures can define their own:
  *
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..16160ca8907f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -15,12 +15,6 @@
 
 #include <asm/mcs_spinlock.h>
 
-struct mcs_spinlock {
-	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
-};
-
 #ifndef arch_mcs_spin_lock_contended
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics

From patchwork Tue Jan  7 13:59:44 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928939
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7186F1F192F;
	Tue,  7 Jan 2025 14:00:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258419; cv=none;
 b=rkdY/0qXgVsIniTK1Y2dOolRTM5r+N/vu/lySA/vtTn1ADEXCjdI/s4mpKa4y9AwZUyCzGpzCiXJmKBJ8OhY9E8QMs0/vZsbc7a82zoO/GdA3wHBBphxKdzukfmrg/ZnXmkBJfeS2J49D7RSjrE1nh/MVpzo7n8M7bQQ6z+EX2A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258419; c=relaxed/simple;
	bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Le6Ft+wJfQ8o7FN8C+s6qseJZXRSHtNQzLYuXq2wAg+CA4Ko61NufJnaDYsaOIHZwa51L7tuE25dMdkcT4C0rY9/UgkDvY03bh52oq8PwFRd1ApjqSwrAYCHM6Pw6PtRIt2rEwV6jBbJTbEWnkD7Ddn+M6oRnv1dkq9z8czKDVI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=mg9aVluB; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="mg9aVluB"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43618283d48so112671965e9.1;
        Tue, 07 Jan 2025 06:00:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258414; x=1736863214;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=mg9aVluBvpeYh9lGhSikoILOIUkYpMfAHlIZzqvzY+t5emEzUGJd/ks80y7b/b8aqM
         o9xkciwDqq+ZFqoQocZYwUivp8RW+UG5OTo7jbgwKBKReC5d0Lt0Om14AvGVtCINi+zh
         WNineb1jsDWv+dFpxnp+wKHx3C1XbeK7wAlc7J0I4dTQmcTxjY0IRYrC4kKpGzgR5Szd
         iNYLTGgRvaCvLdfn1b+zUHvC8e+GZEyZSLT4OELDhjvCoFN8uNkBIrEfcx2/yjZoa2hg
         c4yRZJwKiDAZUzzZlzQk4zFtrwmGbgFZZfOQVAjG3vqKgF6cP0gbCiaWZQv5sowIpJrM
         EzfQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258414; x=1736863214;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=wq5lK2hVTuJ/I4FQTXvtuqQgFc4XplO8Cm3KGiXpsIuuxSl/QbZpQ3LdEeQSlFUB2R
         w2PsGhYXIfP+rmdJUTh0D2rg+inkoQH9BOSFmIAcmc4QDpXCSmc0k4e4Y/t8KBx0qxqy
         LaPRACzRKdHt82A9s8A4Zu8ktW0Lt+qc9fuJk9VLz199ASQHz+qaV7bbzuVL2+N3A0Hv
         FOSKOE+jmUfECfcT75imn6H0wSv1OhgZRDYLFPnt82Se83u4mt2W2zP5TbHoZQP7Ek2+
         ENAvGlstVs8U36Dpabn1a+UT92auqat3oYvjykyUtgh+x7YvE5Pv/gqUi16fjLWzzW43
         xclw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVkCiNP8mkD2Fwmz5/z0fPGPeZ38D0oCuyggp9RzBERUtudgSJrSDDAClbx6a4U2rp5F5pmjgLKyTZtKAc=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy9xanzsjxcuO4Lp28S2pY/2X0nkIfKLmbKzPvM/iKZSdniZjxl
	D0M0e3dMaGZZTMYwqUGalmjl49nRTK5e0ykhR33b3x5rj8QDjJ0r4IQe2XebOYV1UA==
X-Gm-Gg: ASbGncsopSck+gegXmto6/Eckx4996uRFyaw+3K0LRPvgqRynJVL0G+yyPDaU2ZTbye
	132TYXDDifKJrW22q+TzXcWPObNwYc43vqhvebPkzhDuQYYW4EoWR9gmZ8stZ+041Sz4Ph0rYFz
	NBWnYxAi7mwrTxQDRnIzSss8rQXqzxVUgA+Fio+JHvg7S4yXg+61rYf9hvhyjzKHkm9HXD7c55I
	4eWDXGKNOL19tF4tbEZiVL9U5xCQvuorUkUGOaGJg1n3Q==
X-Google-Smtp-Source: 
 AGHT+IFiyD84c/1zSu7Oln7ydljfgJCTEgmDeoPh4w8pAmpBC+94dqtCoRdHxfVsIn5soGnRaVMuVQ==
X-Received: by 2002:a05:600c:470a:b0:434:f767:68ea with SMTP id
 5b1f17b1804b1-43668548337mr516428205e9.5.1736258413602;
        Tue, 07 Jan 2025 06:00:13 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c84705esm50766915f8f.44.2025.01.07.06.00.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:13 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 02/22] locking: Move common qspinlock helpers to a
 private header
Date: Tue,  7 Jan 2025 05:59:44 -0800
Message-ID: <20250107140004.2732830-3-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject;
 bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCckVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL
 cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8RysRWEA
 Cz8cIh7uVLqZ3AlXogR1Z+KrIUQB/B53bS/1rsv12DBPln6QaX8ZjDWch35/NC7cUhaWLe5bqH6Xws
 9u9GKFsj/+me6SWK/q8grP4Wsd8E+HOvJ7i29JqBf2pckR7Lwd4vE5h1sHx7HNC4cSAsztpE16SsUO
 TMWDnLV2d7ct2LiuS6vQn28H3JHNKwidwyqpHKwycMM1smfcJWtoMdI5qKbo0cRWND2/AZiAZtml/9
 jSPSfy0vaWODAqMEpmBXNKy2ox0ODs/PQRkRTxlQKiWSS/niGNEfMuJophzILBskxH/9Y18CoeXfaf
 kgu2YbYIacw849JteraehaR6Db8/d1C9zXk+4ttnpUDyVT8wnVGGw8C6AsOxs9E2aETkJMTO8bu7wi
 rYwvSIEMZLoJenciI/p3OHVHeDZ7y+YbqkNR03t16z8ZE1l+rXre4ZHI941QRc189ZSgs2zSPbsybI
 oi7HLBK4fctvDRn4YMtUhBZb8f0QT+ybFRZOD8ecAqUejxxIN+WVJfLBgUAamf3gtC4EJnI1asG4a1
 c8IkI62aZCqO2oOAbsBcTeVhkTvYUmNfhKpWqbHfA/dNSRwdaeJncSFJaPghf31CzmrznfxAF23gEA
 vDqIGFSc9sJ4MccubFDKpfaNRzG4/jUFqdNCly9Qnnij+tWAj1WKWXQb5zMg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/qspinlock.c | 193 +----------------------------------
 kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+), 188 deletions(-)
 create mode 100644 kernel/locking/qspinlock.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7d96bed718e4..af8d122bb649 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -25,8 +25,9 @@
 #include <trace/events/lock.h>
 
 /*
- * Include queued spinlock statistics code
+ * Include queued spinlock definitions and statistics code
  */
+#include "qspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -67,36 +68,6 @@
  */
 
 #include "mcs_spinlock.h"
-#define MAX_NODES	4
-
-/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
- * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
- * that, we insert two more long words to pad it up to 32 bytes. IOW, only
- * two of them can fit in a cacheline in this case. That is OK as it is rare
- * to have more than 2 levels of slowpath nesting in actual use. We don't
- * want to penalize pvqspinlocks to optimize for a rare case in native
- * qspinlocks.
- */
-struct qnode {
-	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	long reserved[2];
-#endif
-};
-
-/*
- * The pending bit spinning loop count.
- * This heuristic is used to limit the number of lockword accesses
- * made by atomic_cond_read_relaxed when waiting for the lock to
- * transition out of the "== _Q_PENDING_VAL" state. We don't spin
- * indefinitely because there's no guarantee that we'll make forward
- * progress.
- */
-#ifndef _Q_PENDING_LOOPS
-#define _Q_PENDING_LOOPS	1
-#endif
 
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
@@ -106,161 +77,7 @@ struct qnode {
  *
  * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
-
-/*
- * We must be able to distinguish between no-tail and the tail at 0:0,
- * therefore increment the cpu number by one.
- */
-
-static inline __pure u32 encode_tail(int cpu, int idx)
-{
-	u32 tail;
-
-	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
-	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
-
-	return tail;
-}
-
-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
-{
-	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
-	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
-
-	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
-}
-
-static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
-{
-	return &((struct qnode *)base + idx)->mcs;
-}
-
-#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
-
-#if _Q_PENDING_BITS == 8
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->pending, 0);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- *
- * Lock stealing is not allowed if this function is used.
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
-}
-
-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	/*
-	 * We can use relaxed semantics since the caller ensures that the
-	 * MCS node is properly initialized before updating the tail.
-	 */
-	return (u32)xchg_relaxed(&lock->tail,
-				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
-#else /* _Q_PENDING_BITS == 8 */
-
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	atomic_andnot(_Q_PENDING_VAL, &lock->val);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
-}
-
-/**
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail)
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	u32 old, new;
-
-	old = atomic_read(&lock->val);
-	do {
-		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
-		/*
-		 * We can use relaxed semantics since the caller ensures that
-		 * the MCS node is properly initialized before updating the
-		 * tail.
-		 */
-	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
-
-	return old;
-}
-#endif /* _Q_PENDING_BITS == 8 */
-
-/**
- * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
- * @lock : Pointer to queued spinlock structure
- * Return: The previous lock value
- *
- * *,*,* -> *,1,*
- */
-#ifndef queued_fetch_set_pending_acquire
-static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
-{
-	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
-}
-#endif
-
-/**
- * set_locked - Set the lock bit and own the lock
- * @lock: Pointer to queued spinlock structure
- *
- * *,*,0 -> *,0,1
- */
-static __always_inline void set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
-}
-
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
 /*
  * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
@@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * any MCS node. This is not the most elegant solution, but is
 	 * simple enough.
 	 */
-	if (unlikely(idx >= MAX_NODES)) {
+	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
 		while (!queued_spin_trylock(lock))
 			cpu_relax();
@@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
-		prev = decode_tail(old);
+		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
new file mode 100644
index 000000000000..d4ceb9490365
--- /dev/null
+++ b/kernel/locking/qspinlock.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Queued spinlock defines
+ *
+ * This file contains macro definitions and functions shared between different
+ * qspinlock slow path implementations.
+ */
+#ifndef __LINUX_QSPINLOCK_H
+#define __LINUX_QSPINLOCK_H
+
+#include <asm-generic/percpu.h>
+#include <linux/percpu-defs.h>
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/mcs_spinlock.h>
+
+#define _Q_MAX_NODES	4
+
+/*
+ * The pending bit spinning loop count.
+ * This heuristic is used to limit the number of lockword accesses
+ * made by atomic_cond_read_relaxed when waiting for the lock to
+ * transition out of the "== _Q_PENDING_VAL" state. We don't spin
+ * indefinitely because there's no guarantee that we'll make forward
+ * progress.
+ */
+#ifndef _Q_PENDING_LOOPS
+#define _Q_PENDING_LOOPS	1
+#endif
+
+/*
+ * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * size and four of them will fit nicely in one 64-byte cacheline. For
+ * pvqspinlock, however, we need more space for extra data. To accommodate
+ * that, we insert two more long words to pad it up to 32 bytes. IOW, only
+ * two of them can fit in a cacheline in this case. That is OK as it is rare
+ * to have more than 2 levels of slowpath nesting in actual use. We don't
+ * want to penalize pvqspinlocks to optimize for a rare case in native
+ * qspinlocks.
+ */
+struct qnode {
+	struct mcs_spinlock mcs;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+	long reserved[2];
+#endif
+};
+
+/*
+ * We must be able to distinguish between no-tail and the tail at 0:0,
+ * therefore increment the cpu number by one.
+ */
+
+static inline __pure u32 encode_tail(int cpu, int idx)
+{
+	u32 tail;
+
+	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
+	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
+
+	return tail;
+}
+
+static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes)
+{
+	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
+	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
+
+	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+}
+
+static inline __pure
+struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+{
+	return &((struct qnode *)base + idx)->mcs;
+}
+
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->pending, 0);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	/*
+	 * We can use relaxed semantics since the caller ensures that the
+	 * MCS node is properly initialized before updating the tail.
+	 */
+	return (u32)xchg_relaxed(&lock->tail,
+				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
+		/*
+		 * We can use relaxed semantics since the caller ensures that
+		 * the MCS node is properly initialized before updating the
+		 * tail.
+		 */
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return old;
+}
+#endif /* _Q_PENDING_BITS == 8 */
+
+/**
+ * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
+ * @lock : Pointer to queued spinlock structure
+ * Return: The previous lock value
+ *
+ * *,*,* -> *,1,*
+ */
+#ifndef queued_fetch_set_pending_acquire
+static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+#endif
+
+/**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+}
+
+#endif /* __LINUX_QSPINLOCK_H */

From patchwork Tue Jan  7 13:59:45 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928940
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 141B41F2377;
	Tue,  7 Jan 2025 14:00:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258421; cv=none;
 b=JgnXI15X5cHWR+KdUnznxUsyMIjIEh+MQxQbg02datsSHp91JIxFuatDRgrMwnUafngs6129uMd2kx4SKdY4CtItWU4TRNsUyVdY5vyJipzHfHndZSQMjDF+EkL5Jyy7KpLjhUdu6Oc495QDCmUW7F6tXcFpp0i53JC6m5tLF7U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258421; c=relaxed/simple;
	bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=b8BTRi327sPG1kM4gd5VWNQcP7Rt89jp0seuk2ss6v21MiAFBZht7vTV6/V7BzJ3iIIN5kZ4uvIW025t0SX0wgTFDo/R+kGB2T60r8mirxJepv6j47IMmZ8/PV9Krawl7XSuag2rKv1uaCduuf+tqWnz9UgwTMJVrFZiLnJDTGE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=iEwez+pZ; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="iEwez+pZ"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4361c705434so112614185e9.3;
        Tue, 07 Jan 2025 06:00:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258415; x=1736863215;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=iEwez+pZnIqFnl+ZrdWs0XCen04DnI+ztRR9rNX5VcJNodSkXsLYPgej85g564piwm
         AA80Kqp47PePVY9bgVCRjEY0FUEcuizt1z/OccEkdrc6XREcdoOd+KfBl5KxR2vJwaM4
         4D54tOctkfKc59akLYkA+3P/eNC6Ul1dajBxTlJ/O5CjWv8KuSpHd17vYaxJqHi0ZfJx
         9uqx0ySuKOWUgckfCv25OiwXzhUsGn3cvZRFzYFBUs5I+qpVGFIKMnOy2bjz4lcrpvHn
         IK0vLfW/gnl0I62gl3C/MQZ2lHM2Ak4vQRVArukXu+0o4CIt9DpbCci7sZXhlmdRXni9
         7Gow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258415; x=1736863215;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=a3KihMNmSb7PbFKeH1s5XBbXrvjIi7BsRlRu6iMykXNvhbjjVSs+/B1iUa2nWkVO44
         IM+SAm/bDMkutZ0tB4bdJPmVywgy1XoYs3igsqi+O194qpZ9cQ5wziIsjkA7nSnwRdj4
         sYJy3sI/kAfHoM8RTTUrEspoQJqLHOemg3HI/2o2o7sQQpdSmHM46ZTor4K3FFbTYzj8
         hPUGP4b3DnnyOIOIuisq5/QqnVoRIchs9A5pvY0jIGeQ4yBz9e50YcCXMyhwPvnFxnpO
         GGwx7uxW/6aVavs68MkRmZoXoIQZNy+LKqmE4swel2BUq9H7yNlZcGaGWiAUkJqL7BdE
         XcMg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXIItR+T5kdHEkjUYuJN+Heu6TyaK5q7FVBR789nHNCN+kq7rIKprhH0wF8HQ+0CpO5mT1i5OOoJWrkfLA=@vger.kernel.org
X-Gm-Message-State: AOJu0Yze9ATOkFV7Aqy4Oe7yaYe5xhhtt+jgHElDmt59C/R91RExSPmb
	qSlEXZ35O8eWcJ3S37AU90LLTDWwW0oORByf00O/5saWyOJQT4PhYwmOvrAdNfY96w==
X-Gm-Gg: ASbGncvRJoYa6B7HZrzPSPHGHSGuDYSA+banJlEjQxO9ZO45VxyXjqmAyWQBCwAqb6E
	evr74vquUgev2i8Mlovl77LpJEUyuVHnLRL9ehnB9VtuGiNbIrmlQFExn//LYw9yyijqPKM9/1I
	CgftehaEpNU+gJNPpDvSuLTNAkgvfQdnUJZFh7de5dfFNelbHt3h22XNujtc2+P9WElpymMvQE5
	oeekLOLuU4Ua4DUQtRWGViDjWTkN/X4bTihPjwSEDEJ8DQ=
X-Google-Smtp-Source: 
 AGHT+IGl/quTfZ0QDDqXHqWjJoRsh8OWOhApvIvQS4MVfErU6O2RjHxPDcRXKkJfVFPrubLNmn/2ZA==
X-Received: by 2002:a05:600c:3b23:b0:434:9499:9e87 with SMTP id
 5b1f17b1804b1-43668b5e194mr461369945e9.25.1736258415048;
        Tue, 07 Jan 2025 06:00:15 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:19::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43661200abesm596214235e9.18.2025.01.07.06.00.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:14 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 03/22] locking: Allow obtaining result of
 arch_mcs_spin_lock_contended
Date: Tue,  7 Jan 2025 05:59:45 -0800
Message-ID: <20250107140004.2732830-4-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject;
 bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCcvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b
 Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8RyolEEA
 Cy9kJaG9NGyOsMOD1p4wmSnWMoa/VZ40oGbhBytzdYgJCRB2HL5vkDh8XXGQM15xQhcjHIre3XKFge
 MYxgtOeyRBxjlJ7I59jN0Abntd4t4JHzVb29xfXYU8mJ6ffYkcpeGfmF2m96iKFc0GF0IX1l3Zl+tj
 i9ByrgJ29NGd07AQC7LjB9Iu8AJUjtqOmWCj24rjwvHHSQHUJCx4Pm51Js3KSBijQHwSVkYKEeiTq6
 z7qhjfaKS7scU2xicVnOoWomAW8nJjNaXJ/PjoyHRvzUr0xiw5zzYpElTFBan3s2Q1+NNF4llJ1LdM
 U+UWp2jrbwUAWFbJJK2dvmzZNuI6Lbp8ZQK2O5B2mRhsfDcdNwO/tje684Wbt1/DiSRKDoKzgZXNuk
 nKe8bA2w7YxQPsqOW8wrb6JRwt0mHs/71A2OykQrbDKhoGASCICezYQF+b9ri3D+MYXaaV6jJX9g4H
 EQrhRum92e4C2xxmWJuwFpCUs+WB5Q4rUDuSi6F9aC+MTVqIZiTpH1S7ZV0Co84Ck29cCyaEGwt4ed
 PWDaRhfOssUVISSpUCuPYn7j5zAmvb1EsJcHR9jMW0oiOxCSqKR8citOaPcAC147N3Enhj+jGRgbFK
 PrDzH4chobZEuLUX0q+qgdIxncgiEIi6OOrRGEhx/sxRMgjoUDUV63MOwvng==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/mcs_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 16160ca8907f..5c92ba199b90 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -24,9 +24,7 @@
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
 #define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
-} while (0)
+	smp_cond_load_acquire(l, VAL)
 #endif
 
 #ifndef arch_mcs_spin_unlock_contended

From patchwork Tue Jan  7 13:59:46 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928942
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1C701F2394;
	Tue,  7 Jan 2025 14:00:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258424; cv=none;
 b=aFiXhuwFrjVPt2AmjzWH15lj+82Y4bBB7YcqBucPXXs1u440/q9UolmM1351F5soMRsx9dauF7212+2EB/jmkASeBTXX3I3taGNQy7GuoBipypytjLPGQnpQNTeOVwjpkq1+4RPAD3YtfHiQ/+Uzwyry7osgBHy7fx8bRA9xmvs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258424; c=relaxed/simple;
	bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=jfEXWq3IT3CBYOeekHkLuTj/SE34GO262VbKir707yT+k+VnZr9okMurfSUSwreS4PwnM3zbIglp/OLoXsRU7nwLWJ24peVXeAUWNIBGEvcOSxllJNLeHWf+2MPLBdE1jp3jB0hF55w5KIidmD+F9c1JV7aGB7/p+bV53cB+Ia0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=KTc+7xnx; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="KTc+7xnx"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-436345cc17bso112745995e9.0;
        Tue, 07 Jan 2025 06:00:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258418; x=1736863218;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=;
        b=KTc+7xnx7oNKlGPGbSlx7UIhFoMO3CvhsnLW90/gi6lhdthwfp1Y/qTwUqeOb8Qh1A
         NDW5zRj7YeZhlNZG4STPaXorNhG9QVRECW/wZRHRAzo9SKkvIVE7gX/iwnq+eTGa2MPV
         76FSvoVw1huLU7L6Ama7Kq2rHRF7Da/aTXeb4o8nqX2UQi9q9srmnWCEAUWEROLNYRzM
         PfyGjUfXreNFaW7/0DCXNYCVlvkHqnZXRwaJLKZFDbYIZG2ierE+YU72lNpmS2FTCNoL
         Zi0/ifNMrhj0bbZC/EDiryVOKZS8g1o/tJC/kwdOQQHGmZ+9VtFuat5mALPczTwrvZUs
         OVUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258418; x=1736863218;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=;
        b=dJQNmX2K4F+B5YA6K4KnPrUnOXkLEguNKycQEhSCF4SY543qQr3gSMRRSKPoIqgVAz
         ZWXCI2HvueWCrZ0fRcNw6Ym4OXZVZBQBYZtKZUuj1LFLwg/w0Y4D6pl/CnQSApw80dlR
         r3XczvjvZzQphhqKPMjujQb/VbFOSlvpxhM0PQ0lNkggBCd2wXXVq1q2Dr/dzIVhPP0E
         /xVA/i+w1B3jA563Bjc7ecCUs1GRA8WN1N6YoAs3I+AEhNpdxsEzW5JAPRdEzF1vRVXh
         FYam30Tg2t/Km4K1YQySI7C/sImbhsKs1tNP9GHnD9RJcfUji4uCbraAJXCzNr7RFtWD
         eyUg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUWm3KH/YC6mg0D+t+dhSUyP2qPfQYP6p98zEs9Q7FqgYLB9T0Tzxwx7XEjTZd7aH8IBlh4hHGjH9iGVHU=@vger.kernel.org
X-Gm-Message-State: AOJu0YwHpnxMejWhGNgRwufM1paoEv6g7ok4KRAFIAJQimw45vapgh6o
	qS5rDNnN4CfnF5ckQ5mpmndp6ya2XPMeX4shTQM56fW4hvvI+eiRY3rjMfiNKs0PtA==
X-Gm-Gg: ASbGncvoizOPBswutJimo/vaC1d897BNNs3BmZLso/UD5gSVG/eJ8d2ax81q7eZb0d0
	Hlf+IyCn5/15CPciHD7jDo/qJSxIy5VtswnKXFeZwV3rh6JX8ikzwj25bSCOaoUWMLPC1OhGWpp
	FZhkD5H8ad1Li20Zxwgt0MZGth9y+jf79D+9pKgGsEa8zO8h1O86qA/7GcQhF0C6G83yirwcgx+
	ydCZ3ti8WiO6aOOpq2LjLdTR508mToAPmohuFVOzQWvKw==
X-Google-Smtp-Source: 
 AGHT+IFfNmCa4q1Cw/Nabmeuezx046sbsXS1QBQb4Zs8VpgAPwOpC7bZPZlcMjmk3wxOENIhaGFtSg==
X-Received: by 2002:a05:600c:4a83:b0:436:1aa6:b8ee with SMTP id
 5b1f17b1804b1-4366d356735mr468444215e9.2.1736258416617;
        Tue, 07 Jan 2025 06:00:16 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:2::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436724169afsm547781515e9.25.2025.01.07.06.00.15
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:15 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 04/22] locking: Copy out qspinlock.c to
 rqspinlock.c
Date: Tue,  7 Jan 2025 05:59:46 -0800
Message-ID: <20250107140004.2732830-5-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=14101; h=from:subject;
 bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCcgODNr0FrQcgKvek25PD/Kof6Fg8QSVqhrY4b
 ll0Vr4eJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8Ryv5DEA
 DAIXSccB3X4AanDh+WRHz1P5PHMtOszKz0APTb3q4iMSQL6TnfjO+QS46sr9Gwtgl/dRE44uFZDf81
 79urGKxi8elUzCRUwVDrqpomlnhRvMYxkHunvLspKzCjh0XOUqkc2H9QYyJq33iomF62Uc9QqdN5So
 jTczCIf2YyxsithCzWT82EuuR+FyxvS/U9D4yxBRy3467IjlerK+Jwz1AoZ2xL6+QnLBU26pd7vFH4
 7HSEKLQcuiXnnJ/vIBkSQsKIuxCrGBHDYa8MIbgOvVqthSZAUOtjUY81y9e5ZOn6WqQ+aprKaxCVep
 XGzPKVDoJiRWT5rnzHG4ZKCOEdge2qbMnUpyMZrKx6j2yw8ccatx8AKddw6zSVoVG8c8EDRS40tOzu
 LKoJOxNDqMf/ylWGsM73tF+GCQtOU4sc3JzWnWivFWeuvBq9+SCwsOubA7SffwLeSHzkXijIPqezBM
 FoHX3zIvZKO/eCBIkjTyQmR8WfG/TUK1Lzi1Ox6yNMABG7oul1MkG4/7G3C8yn8+KpNQHs2OURAULi
 8tdWt1znwRu2phLKHHHKy4Go4+hNPSRLSRcP/gOTi2CuYo+s9+hnfZW7nQ04gnW3tVy4ySlEydwnP7
 szOfiKUUmDi1j+lRWvilSlK/s+t3d4YK2Mztc3mTkfujHssSxIbzxjvWA8JA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 kernel/locking/rqspinlock.c

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
new file mode 100644
index 000000000000..caaa7c9bbc79
--- /dev/null
+++ b/kernel/locking/rqspinlock.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ * (C) Copyright 2013-2014,2018 Red Hat, Inc.
+ * (C) Copyright 2015 Intel Corp.
+ * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ *          Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
+#include <linux/smp.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mutex.h>
+#include <linux/prefetch.h>
+#include <asm/byteorder.h>
+#include <asm/qspinlock.h>
+#include <trace/events/lock.h>
+
+/*
+ * Include queued spinlock definitions and statistics code
+ */
+#include "qspinlock.h"
+#include "qspinlock_stat.h"
+
+/*
+ * The basic principle of a queue-based spinlock can best be understood
+ * by studying a classic queue-based spinlock implementation called the
+ * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable
+ * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and
+ * Scott") is available at
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=206115
+ *
+ * This queued spinlock implementation is based on the MCS lock, however to
+ * make it fit the 4 bytes we assume spinlock_t to be, and preserve its
+ * existing API, we must modify it somehow.
+ *
+ * In particular; where the traditional MCS lock consists of a tail pointer
+ * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to
+ * unlock the next pending (next->locked), we compress both these: {tail,
+ * next->locked} into a single u32 value.
+ *
+ * Since a spinlock disables recursion of its own context and there is a limit
+ * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there
+ * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now
+ * we can encode the tail by combining the 2-bit nesting level with the cpu
+ * number. With one byte for the lock value and 3 bytes for the tail, only a
+ * 32-bit word is now needed. Even though we only need 1 bit for the lock,
+ * we extend it to a full byte to achieve better performance for architectures
+ * that support atomic byte write.
+ *
+ * We also change the first spinner to spin on the lock bit instead of its
+ * node; whereby avoiding the need to carry a node from lock to unlock, and
+ * preserving existing lock API. This also makes the unlock code simpler and
+ * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *      atomic operations on smaller 8-bit and 16-bit data types.
+ *
+ */
+
+#include "mcs_spinlock.h"
+
+/*
+ * Per-CPU queue node structures; we can never have more than 4 nested
+ * contexts: task, softirq, hardirq, nmi.
+ *
+ * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
+
+/*
+ * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
+ * for all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+					   struct mcs_spinlock *prev) { }
+static __always_inline void __pv_kick_node(struct qspinlock *lock,
+					   struct mcs_spinlock *node) { }
+static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
+						   struct mcs_spinlock *node)
+						   { return 0; }
+
+#define pv_enabled()		false
+
+#define pv_init_node		__pv_init_node
+#define pv_wait_node		__pv_wait_node
+#define pv_kick_node		__pv_kick_node
+#define pv_wait_head_or_lock	__pv_wait_head_or_lock
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
+/**
+ * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ * @val: Current value of the queued spinlock 32-bit word
+ *
+ * (queue tail, pending bit, lock value)
+ *
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
+ */
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+	struct mcs_spinlock *prev, *next, *node;
+	u32 old, tail;
+	int idx;
+
+	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
+
+	if (pv_enabled())
+		goto pv_queue;
+
+	if (virt_spin_lock(lock))
+		return;
+
+	/*
+	 * Wait for in-progress pending->locked hand-overs with a bounded
+	 * number of spins so that we guarantee forward progress.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	if (val == _Q_PENDING_VAL) {
+		int cnt = _Q_PENDING_LOOPS;
+		val = atomic_cond_read_relaxed(&lock->val,
+					       (VAL != _Q_PENDING_VAL) || !cnt--);
+	}
+
+	/*
+	 * If we observe any contention; queue.
+	 */
+	if (val & ~_Q_LOCKED_MASK)
+		goto queue;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock
+	 */
+	val = queued_fetch_set_pending_acquire(lock);
+
+	/*
+	 * If we observe contention, there is a concurrent locker.
+	 *
+	 * Undo and queue; our setting of PENDING might have made the
+	 * n,0,0 -> 0,0,0 transition fail and it will now be waiting
+	 * on @next to become !NULL.
+	 */
+	if (unlikely(val & ~_Q_LOCKED_MASK)) {
+
+		/* Undo PENDING if we set it. */
+		if (!(val & _Q_PENDING_MASK))
+			clear_pending(lock);
+
+		goto queue;
+	}
+
+	/*
+	 * We're pending, wait for the owner to go away.
+	 *
+	 * 0,1,1 -> *,1,0
+	 *
+	 * this wait loop must be a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because not all
+	 * clear_pending_set_locked() implementations imply full
+	 * barriers.
+	 */
+	if (val & _Q_LOCKED_MASK)
+		smp_cond_load_acquire(&lock->locked, !VAL);
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	clear_pending_set_locked(lock);
+	lockevent_inc(lock_pending);
+	return;
+
+	/*
+	 * End of pending bit optimistic spinning and beginning of MCS
+	 * queuing.
+	 */
+queue:
+	lockevent_inc(lock_slowpath);
+pv_queue:
+	node = this_cpu_ptr(&qnodes[0].mcs);
+	idx = node->count++;
+	tail = encode_tail(smp_processor_id(), idx);
+
+	trace_contention_begin(lock, LCB_F_SPIN);
+
+	/*
+	 * 4 nodes are allocated based on the assumption that there will
+	 * not be nested NMIs taking spinlocks. That may not be true in
+	 * some architectures even though the chance of needing more than
+	 * 4 nodes will still be extremely unlikely. When that happens,
+	 * we fall back to spinning on the lock directly without using
+	 * any MCS node. This is not the most elegant solution, but is
+	 * simple enough.
+	 */
+	if (unlikely(idx >= _Q_MAX_NODES)) {
+		lockevent_inc(lock_no_node);
+		while (!queued_spin_trylock(lock))
+			cpu_relax();
+		goto release;
+	}
+
+	node = grab_mcs_node(node, idx);
+
+	/*
+	 * Keep counts of non-zero index values:
+	 */
+	lockevent_cond_inc(lock_use_node2 + idx - 1, idx);
+
+	/*
+	 * Ensure that we increment the head node->count before initialising
+	 * the actual node. If the compiler is kind enough to reorder these
+	 * stores, then an IRQ could overwrite our assignments.
+	 */
+	barrier();
+
+	node->locked = 0;
+	node->next = NULL;
+	pv_init_node(node);
+
+	/*
+	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
+	 * attempt the trylock once more in the hope someone let go while we
+	 * weren't watching.
+	 */
+	if (queued_spin_trylock(lock))
+		goto release;
+
+	/*
+	 * Ensure that the initialisation of @node is complete before we
+	 * publish the updated tail via xchg_tail() and potentially link
+	 * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
+	 */
+	smp_wmb();
+
+	/*
+	 * Publish the updated tail.
+	 * We have already touched the queueing cacheline; don't bother with
+	 * pending stuff.
+	 *
+	 * p,*,* -> n,*,*
+	 */
+	old = xchg_tail(lock, tail);
+	next = NULL;
+
+	/*
+	 * if there was a previous node; link it and wait until reaching the
+	 * head of the waitqueue.
+	 */
+	if (old & _Q_TAIL_MASK) {
+		prev = decode_tail(old, qnodes);
+
+		/* Link @node into the waitqueue. */
+		WRITE_ONCE(prev->next, node);
+
+		pv_wait_node(node, prev);
+		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
+	}
+
+	/*
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
+	 *
+	 * *,x,y -> *,0,0
+	 *
+	 * this wait loop must use a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because the set_locked() function below
+	 * does not imply a full barrier.
+	 *
+	 * The PV pv_wait_head_or_lock function, if active, will acquire
+	 * the lock and return a non-zero value. So we have to skip the
+	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
+	 * been designated yet, there is no way for the locked value to become
+	 * _Q_SLOW_VAL. So both the set_locked() and the
+	 * atomic_cmpxchg_relaxed() calls will be safe.
+	 *
+	 * If PV isn't active, 0 will be returned instead.
+	 *
+	 */
+	if ((val = pv_wait_head_or_lock(lock, node)))
+		goto locked;
+
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+
+locked:
+	/*
+	 * claim the lock:
+	 *
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,*,0 -> *,*,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail)
+	 * and nobody is pending, clear the tail code and grab the lock.
+	 * Otherwise, we only need to grab the lock.
+	 */
+
+	/*
+	 * In the PV case we might already have _Q_LOCKED_VAL set, because
+	 * of lock stealing; therefore we must also allow:
+	 *
+	 * n,0,1 -> 0,0,1
+	 *
+	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
+	 *       above wait condition, therefore any concurrent setting of
+	 *       PENDING will make the uncontended transition fail.
+	 */
+	if ((val & _Q_TAIL_MASK) == tail) {
+		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+			goto release; /* No contention */
+	}
+
+	/*
+	 * Either somebody is queued behind us or _Q_PENDING_VAL got set
+	 * which will then detect the remaining tail and queue behind us
+	 * ensuring we'll see a @next.
+	 */
+	set_locked(lock);
+
+	/*
+	 * contended path; wait for next if not observed yet, release.
+	 */
+	if (!next)
+		next = smp_cond_load_relaxed(&node->next, (VAL));
+
+	arch_mcs_spin_unlock_contended(&next->locked);
+	pv_kick_node(lock, next);
+
+release:
+	trace_contention_end(lock, 0);
+
+	/*
+	 * release the node
+	 */
+	__this_cpu_dec(qnodes[0].mcs.count);
+}
+EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()	true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head_or_lock
+
+#undef  resilient_queued_spin_lock_slowpath
+#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
+
+#include "qspinlock_paravirt.h"
+#include "rqspinlock.c"
+
+bool nopvspin;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+#endif

From patchwork Tue Jan  7 13:59:47 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928941
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0879F1F2C26;
	Tue,  7 Jan 2025 14:00:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258423; cv=none;
 b=VrOzOFY7xvIsw+7SVvQhnEB0Rrfh8l2oYy4Fadb6p0WQAHAuemfqDxK5lIgg11jmeg6Z5Xx4hLIXlvZ8wae6fm98LHOybRV25aQZveecjmn6DdX5n/vzG98J+cCBvCKMIl0itFbrW/UCPJ1ht3oVza2dncW7PMV9NYTPDDnGf0U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258423; c=relaxed/simple;
	bh=TiZoNzl0RgiYrKoxOttwDwQ6phR397SKF8QOHX/CtQw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=t4f3YncYLEWSk8BxX/h6vJK/LS79qogqhToVViUgnMewHpCTwA4pr025Z/x2qbutl1biCajbbht8VqVBSbleYaGgl37616VeroyvWFbKYeVc8IJh+/gys7nWlh/Hf8FQiEhrPQ/9RqbpH0yPBST77n6lt+AWVhCFCV+nQ9TSSi4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=hXxNdUyS; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="hXxNdUyS"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43675b1155bso148810295e9.2;
        Tue, 07 Jan 2025 06:00:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258418; x=1736863218;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+v8wwQRMiAXIk0d5McK1drNuJdKwSevaWfwlzh6m08A=;
        b=hXxNdUySlr8Q1s1RmNpk3gaj9qUJeSNwVwW/Gcp3no3K/kUuw96bs1fo8J/cfQZIoU
         lq0wiEMq5I7is1m8IyDpPp9vAyjwvilnEmZt1RzS/AW2oHX0HynWNKkDJrE+T4+r6Mxl
         UgQDCupMN6W8FyuZUlQkpLY2HJcI6plqe65HZPkP24/CWX4B6NVvb7XfSbFMmsq6DtTS
         av/iapsCQvLRSoCmrfHgKgy9O6/UWk6AT2OghnxG3+YYddtSB9+VLG0mQ/+A+mAXKqkX
         ok3eLv+MjRb1Y0wG8rI0iJIwXBWIVWIN6zbqdq0yUOeyG/AzklwRNqC+GUwqhyG0FMzr
         mc2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258418; x=1736863218;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=+v8wwQRMiAXIk0d5McK1drNuJdKwSevaWfwlzh6m08A=;
        b=cyQ2ynvVWHQriecnYVdkiNialQBYyQd9u1/lPMj07uuCfE9C5I4XfLpgrhd66r3KHG
         I7Jxf3YYBURPe6HlhBiaPbtwJxgOrD/+MpwqmdocpSqHJD1M3EfXBfxV819MZXIsKlbk
         N+hqT8xrDptFyWTilpkwxSTlAUT0nFjYf6rrZRRhOmcui1t5ITlsPlb3VjzwXq3Gzaqv
         Gt58n5OGIdji/nr0HvVYfL10/PUL6H6wKdP3GNaHD4/QW2wS1iWHdduEslq8bI5x5v2u
         ays9wZwRMofhKct7UOKmAXX//pFakJB3uuJvF/FZSuJQQlm/1zHgM4y60X7q6ZlShqMz
         mOrQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCU9Rcf0DfKdB+j+AzOh0hYRp7kSXCRZYSaR/edyXj6AEhoahnK6CmDtHme7uSXkKO0b3DhF5LGm5tzwuJg=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy45y0yGmJhkFri4bx4jG+bu+oHoKdPGT7FuMgIZ7oYgMT+BLLF
	QmRZxZSZTPHPBTY2+tCHFBIWCQS61J0D/FVG2t89OetFd1AnotA/XKZYrUzRlabp1w==
X-Gm-Gg: ASbGncuSdmikV5QlCnCf/bnqC+z+O32+q6zLVB5q02YrDFZhJKuuc11i6NZrPsJsIAW
	aDjyRxPPpT/emDidegeWpJvIBg1xPLp+6wN64dXA49yuo9+knokMECMqRzuq6Zv7rVaUOy+1ZBj
	z0Zvha9v4VBdByU7BL51bvGLBa2m0Jd2avDfZ0w5klXL2Vr+aVadvLY6o6lHvW+vwjEESUho6hJ
	yb3jltIvsLDH3g+uFyIfpfFDGrDiFPbk26w+WC5w2apujE=
X-Google-Smtp-Source: 
 AGHT+IGb1bfeq2CiQMVETr9rXVJVnKO6IM9s0EsP5RN9c3rxDS9Lkjv5GvzfMIFUqaC7T2pAXsTwiA==
X-Received: by 2002:a5d:584f:0:b0:385:ea2b:12cc with SMTP id
 ffacd0b85a97d-38a221ea286mr51254874f8f.13.1736258418063;
        Tue, 07 Jan 2025 06:00:18 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:17::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c89e1cesm49720755f8f.64.2025.01.07.06.00.17
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:17 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 05/22] rqspinlock: Add rqspinlock.h header
Date: Tue,  7 Jan 2025 05:59:47 -0800
Message-ID: <20250107140004.2732830-6-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1497; h=from:subject;
 bh=TiZoNzl0RgiYrKoxOttwDwQ6phR397SKF8QOHX/CtQw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCclgCk8LJ6bdDVdLpisCaetAgrsza+fIA9DgUX
 va6iEkqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8RyiaMD/
 9ktRXkwuo8lWq4wSfhAG1PV3gPfAHOX3OizjgrBoi+8Vo8P9HNaNfztbPPX/Tqof1Ei4u5qawSJcqQ
 lTR2RBAYnZKTi0EINmKXk0k5DkOC4YzTM8lb/SCMrJB848/dvsiuo+wdSqEyZZK7VSKQCPlVS3bWar
 3UeFKLNYdafVrdKYs5HGlhaeIRSEGaE/A1Le0uy3C3SjKWy5w+CeqifsHGQg8k4WJtqbsliG8Bl8VU
 dyIzoIxxjelbaz+Vlo7znAHiedXe00d2zigWKSyou5GQnj2/Fgw8XWrha3RADZVM7Ex1/d15sj/Mo2
 R0Eoz+7IBGxETeji/UNu2O6cL404DcHYu2qFP67k+l2jX70KHsNyGv/Oz6S89MR/1jmmlHoIDMx8gE
 vQYA+rKqZ+8RcIOn/c1jeWlEJfrD2Fr5P1DNUGGyIGbFWA6+ECMihmGVcl4ZfWeWcTS8inuTxp+Bar
 rzfMCPtTp/g8qMiHWUR1bCm9usIIvaf1tN5bVmN+4BjpSPcuM4hiFTrcv7ttkR07EjRAmfnl2TUFDG
 KwY3DXibGWoTOBPeFdcJQsQFEGXf8uJqX1IXre4C5Dl2M8iY5Apw3/QjlkwINjOUFY+kaTlFOUVhRd
 WYEMjMGZFjFhUIC2Qxi4Uca4Yz2vK2KsbCJsGkepmWpfjlX+sGeM+9+2k1TQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 18 ++++++++++++++++++
 kernel/locking/rqspinlock.c      |  1 +
 2 files changed, 19 insertions(+)
 create mode 100644 include/asm-generic/rqspinlock.h

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
new file mode 100644
index 000000000000..5c2cd3097fb2
--- /dev/null
+++ b/include/asm-generic/rqspinlock.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __ASM_GENERIC_RQSPINLOCK_H
+#define __ASM_GENERIC_RQSPINLOCK_H
+
+#include <linux/types.h>
+
+struct qspinlock;
+
+extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+#endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index caaa7c9bbc79..b7920ae79410 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -23,6 +23,7 @@
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
+#include <asm/rqspinlock.h>
 
 /*
  * Include queued spinlock definitions and statistics code

From patchwork Tue Jan  7 13:59:48 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928946
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BDBE1F2C3A;
	Tue,  7 Jan 2025 14:00:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258429; cv=none;
 b=lOxIs6bg9rKuusVvtn2VwfoTs/aQKArZIvqYy9q+C0SfXA9DqOb/JhkVMtkYoLoIxool18pViGosGBxUZ6z2KJequszMF3SMM3c8ZYurjGjO53c2X1+rPa4Q2i3RF23Y5e24aaPmpTzqyJ1eujWlbjAG/CZA+/7vilUjwMvPOrE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258429; c=relaxed/simple;
	bh=XaKniQP0fw8FDghElxeAS1LeTyDHcHP8/ZPBUoYiI+A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ifowmIS0Ly8DOBy9buAvN0LkQAOWSr1Q+MWte/jiEZjtVAhAQrBXpuyYZpO7JCmSn1ogkIzInxudk9+xd+wJbCp6oespbofQw3UNQkcl7g+gKzj0HsDrdMu7RVBH/3uAxqB8C+3yAtHwsazYVHle1eQImvGYkkSgf713jwxtcrg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=AJn1bUJp; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="AJn1bUJp"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4361815b96cso103946135e9.1;
        Tue, 07 Jan 2025 06:00:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258420; x=1736863220;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=8HM88fWKQU5di/19zVZDgTLqXttcx9EvEIcaqgBIOKg=;
        b=AJn1bUJpxEai3DH9N3gK6FHKglsPXELQS7adnBUtLdd7eAJcuy8SqskV0hLtjKXcT8
         EWM2NiTXZtJ5fhmtD4YXHjEofCNIq8s1i1DR7JwB0LMow+jsuVruwQEkKRSb8Tm+3Bm6
         9nHNR7EZkckpc5J2eJzs2M94oJOz9j2bFoYznDvWhNJxb2VrR+4sq6Jlc480xvOxUEy9
         KP3n44DutBDR0aJ5M779utCshxOJSK/wVzjPvOvGuNUtmgZtJJRTRoa8IMAhJhSOoWDC
         4hKt8Db1LaggWcdJZTMg0oKQ135ITvjJkLOTGRE1DxqNWQBT1qJcbkkJTQhwP8AwiU+g
         d4EQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258420; x=1736863220;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=8HM88fWKQU5di/19zVZDgTLqXttcx9EvEIcaqgBIOKg=;
        b=ALMVEVEnLB5df4RT3J1ahoYI4Az4yUGDY6AZ3AObRhySoDagriBVWCytEW2II55WoZ
         MuhRiCejgF/c8QmMF5JEDa0Niv1Jh2A2vFotDrg1/shDIn5WCriYKBk9tvYRuzmOeAmv
         I9FfxWAq9Sk4HyZSMqCEcuWVe9boAMHSzu/ku5JMgz5JYYlFQviQOfMK6YnRgWlDuBQJ
         dJLvMzascb4KhcyCxJCR5gjPm+G/P9MO8atk6BsnDQylUpMS0q8EtBgCNKu0W0VW0c+Y
         XR0rf17fzFl0SjQ3Eb80wbFleX6r2qNtXOr8tbl3pxkWGVQn4MFEHan/aHUygx/gpEkM
         hngA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWR6QxcaqIaQ7ItmS53fqUAX84pt8LWLreQoBZv7vjzDsLxGUevpfBJiWFP5U2N4019OYxTPBqfWqKBpRk=@vger.kernel.org
X-Gm-Message-State: AOJu0YxVEZVk+scqHR4osluhd1dNLD3Iy+cAjAMbOoxcNCetOVz13pRo
	Oo770BjbLLkAsUuBXHYIJO0QHLg+nJeTJTPJTYcTCDqpqVTPCoyofCxMrb5NBiTTgA==
X-Gm-Gg: ASbGnctkPpLr/9Ca3ER0xjHdYAn+dKnrJHgQsdWRG5gepwP2/ep9akU5Iyclhs/IaK4
	lLiCLedMgxQPUjrRr6KgRU9C5MgSD44sKG8HES7tBOWXY2tHff/E6z1Xx4d1FlGzmKAWpPaV/45
	1RcH2X/WqvI25pwJI/bUUsTPH5Vg7tm5zFAr7CYZH1gJp0h2FPzaiVEb3vP+B2PKBW0HjFsF0Qm
	jNu6eXH+1Gm07IHJxTUKFhljkKqzqhJVU/BJm9T56+Nq+g=
X-Google-Smtp-Source: 
 AGHT+IEE+DWCX5z1DVUUmJ9gqFwqQJvKF3u1tL+V/Ke4xDlxzD8IBkbFEQcWpFd/wQFw9LwtQHXsLg==
X-Received: by 2002:a05:600c:3c98:b0:431:6083:cd38 with SMTP id
 5b1f17b1804b1-4366854889amr493588315e9.6.1736258419687;
        Tue, 07 Jan 2025 06:00:19 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:17::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436d8115be5sm55713735e9.28.2025.01.07.06.00.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:18 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 06/22] rqspinlock: Drop PV and virtualization
 support
Date: Tue,  7 Jan 2025 05:59:48 -0800
Message-ID: <20250107140004.2732830-7-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6317; h=from:subject;
 bh=XaKniQP0fw8FDghElxeAS1LeTyDHcHP8/ZPBUoYiI+A=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCc6R78TqsJfI8yJUzr01q0pRzkirUx2mFuAw33
 qR72e+uJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8RypbnEA
 CMLjwl4NzwBc1YJuNWhjtAF2LgWx+YBHdKjlohsfgdU3W2ih7PbOD/BqOTN0CZsMY0ohW74bWincwc
 2uaARzCW3YWDDLLwZZKcRd9DMrahLLu1f1WU3+pS010cvo+xL7MQHTeu9JvWqy21Vho/GkZa4NKx+T
 nGjaVDdVhRaNaHnCuwchoTs8QCY64P4dpldT+6jPVPuEUOrwBBX7TP4wmwciqa4IKbV23YBx3+UhTR
 fwVnGvotabQkjOeM77PpBNfoLveHirx229Jma+t5GDe1lW57SlHzPS7UJkHrf+u/fRY9fAOyInFlhu
 yJ2LYKeARi7JJQwH57qKghL6No2HsRZkjMq6NGUhEZ21MdWmfRqH0c+N+UzZj6XFRmJC8CO6pET0XV
 Vq2rCoHCHWKYN5p6uer7mrLclliVZQC2OexmFoI+hQnwgsJzngYtia1ZzSWXdXsCturNfFu5JjmoDf
 lMkSWzIzAQpp8sDIjCo5DL7i3Mpt6PbEvGqkM+CRVn/IwiLY0kz+FqbFOUquDBRfRuOK6usQwc8/JN
 B5kx80cUJxBlP1p8PUrh89iJuKF+ZKqE+VJFd9Zfdibe0UsOoDQVkAWLWK3ulwEp/2VFGeT5oAXIKc
 t4o/sWiKoqXyv3i6KkZ1asVtqgHkWEb5sLnfQ9T7b17QDHeIirKMT9+53PQw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well.  Therefore, drop it for now.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 89 -------------------------------------
 1 file changed, 89 deletions(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b7920ae79410..fada0dca6f3b 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -11,8 +11,6 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
-
 #include <linux/smp.h>
 #include <linux/bug.h>
 #include <linux/cpumask.h>
@@ -75,38 +73,9 @@
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
- *
- * PV doubles the storage and uses the second cacheline for PV state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
-/*
- * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
- * for all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
-					   struct mcs_spinlock *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-					   struct mcs_spinlock *node) { }
-static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
-						   struct mcs_spinlock *node)
-						   { return 0; }
-
-#define pv_enabled()		false
-
-#define pv_init_node		__pv_init_node
-#define pv_wait_node		__pv_wait_node
-#define pv_kick_node		__pv_kick_node
-#define pv_wait_head_or_lock	__pv_wait_head_or_lock
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
-
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-	if (pv_enabled())
-		goto pv_queue;
-
-	if (virt_spin_lock(lock))
-		return;
-
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
@@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
-pv_queue:
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 
 	node->locked = 0;
 	node->next = NULL;
-	pv_init_node(node);
 
 	/*
 	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);
 
 		/*
@@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
-	 *
-	 * The PV pv_wait_head_or_lock function, if active, will acquire
-	 * the lock and return a non-zero value. So we have to skip the
-	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
-	 * been designated yet, there is no way for the locked value to become
-	 * _Q_SLOW_VAL. So both the set_locked() and the
-	 * atomic_cmpxchg_relaxed() calls will be safe.
-	 *
-	 * If PV isn't active, 0 will be returned instead.
-	 *
 	 */
-	if ((val = pv_wait_head_or_lock(lock, node)))
-		goto locked;
-
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
 
-locked:
 	/*
 	 * claim the lock:
 	 *
@@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 
 	/*
-	 * In the PV case we might already have _Q_LOCKED_VAL set, because
-	 * of lock stealing; therefore we must also allow:
-	 *
-	 * n,0,1 -> 0,0,1
-	 *
 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
 	 *       above wait condition, therefore any concurrent setting of
 	 *       PENDING will make the uncontended transition fail.
@@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
 	arch_mcs_spin_unlock_contended(&next->locked);
-	pv_kick_node(lock, next);
 
 release:
 	trace_contention_end(lock, 0);
@@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	__this_cpu_dec(qnodes[0].mcs.count);
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-
-/*
- * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef  pv_enabled
-#define pv_enabled()	true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#undef  resilient_queued_spin_lock_slowpath
-#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
-
-#include "qspinlock_paravirt.h"
-#include "rqspinlock.c"
-
-bool nopvspin;
-static __init int parse_nopvspin(char *arg)
-{
-	nopvspin = true;
-	return 0;
-}
-early_param("nopvspin", parse_nopvspin);
-#endif

From patchwork Tue Jan  7 13:59:49 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928943
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AD4C1F2C45;
	Tue,  7 Jan 2025 14:00:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258428; cv=none;
 b=TjeFtc1Pbj3Lbccy6SgQp1xq42mqEHuavGwFpbcJJhKJLfLa61/O9UvBf/THIkVosiFDoPXJvAKSE5371VNXXUQT+h7hvgB77KjHj1hRGI/sqetZy5rxE6NTOFuH3lHeshKCUa7r65zHCuCoDknUymSjY8YMudusS7lFZ7kFj7c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258428; c=relaxed/simple;
	bh=eGu2DApzmi0L8HX4HzBePl4/npJqwEuQr5WMXWg1eJE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=GgUfBI48zv5FolOZjaEiU6JW7R3S6ruVDPMVKSZHOEVLFxLixvxHDOMM96t1UFu6MltM7QbWZmb7NyKftU0Ps/iqa0u+gRKwu4aUWpJJTT7cPE9obtiGR2f3uZ5eANQPbTnbyVyEj3OwmojHWhZdFF39YzKzaYPB94a381YO6RA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=cIDQOl3+; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="cIDQOl3+"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-436202dd730so111509015e9.2;
        Tue, 07 Jan 2025 06:00:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258421; x=1736863221;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=g1S86x/E9hfFZDpeRh/A01JmFmw9TPvNNkAVeTNFDNE=;
        b=cIDQOl3+cGPAXbM3IlOTZT/fibeAvXYTfOgLn5KapdYzaZ50Ji863XQjlarGCZagW+
         bh1DYl1L5wVxv2L5iWDeCtX9o7FbNsL0WVVgqI0SSCZlOwQw7JVe65fbdl9cLvYKjJ9h
         iWb1+gNl59yK3RRSyzv80qi/QlQgTOwfKrD4M9gy6O4u7hgZLBue2bmNZDN34qX8ZdHk
         DuGnztRLqt2ZKK9vQmAqz1Fect9IVbC4vMPpCi2hWO3didXANnXV7w2BeHsa9uaZYpMj
         vaWpA1fUw5Pc+sdcCJoMArAXU7tGtTe38XB+sTq7lnrlAvE0YIRA+Mru5lPVVbdEqfTf
         mBYg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258421; x=1736863221;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=g1S86x/E9hfFZDpeRh/A01JmFmw9TPvNNkAVeTNFDNE=;
        b=slXluWzDqIESxRU1FkwPmW8avqjhZsqMhvVvn1UvmYnBINClZiLqnDAtfIICuAm6DA
         ZILzHvBxJAcjzUeeiX/3+RUW8snYZJkirQTLfiY0IJAAFsLR23eICxmiHUgEzG22NdUH
         KhYflsprkiKqusFGoXVeiothbKkDFdY4GyFegO6szsSf59vaVDD6slCpgg/PCKo8Xgog
         PMpx+fcPpFKkJepWvhOjnejzB0urusPoq0N14fdzBuJrALaWW7ytwnzedwtNt6lm8Eqa
         KLqOWuWiW8H4Y/Ba/Xs5B8qSrmx15A5J9OAq4Bf3SLhT6aYSkDR7/EzhBKKh7b6vEHwP
         nAsA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUGO2KkQLQi+WDJ4azW5FYjovSt7DKUEPrTeP/VybaaPw9i+GOLzpCGxAXgHshePV2Hlmc3gSUwMD4nv/k=@vger.kernel.org
X-Gm-Message-State: AOJu0YyZ34erpFBaC2BQlnC+6/9BXnDsjiSSOkt1tgiEDIMs3iSdZFSN
	oxPAkBcun4aQUoVaTNBTdCpzZ6jLKuNyUpHq8iBq0nmBVuzKj+U+WOxE2U48f/usmA==
X-Gm-Gg: ASbGncsztGoAqXKPkjXhniDysZVoJHNM30YAb3ZVqhcaY5WeiV8Uhc1dbxdaMDvwDTm
	s/tvkRxd8W9bxZMUkbnXY8lcC06UhfpBUhZ+5IhUtxHRTyzYXVmjBaT7xP6i7HzQgUh5y29GVmg
	CDPYaQ4RvifS+hCcVEa7VcYa7FF51OJodoGbdE39OhUSZVdsfV3oFWRyq1J2ljDSaCu2/Ylxbge
	ctoV7w1tYUbRwBr4Kik5E5hT60bborOmY4AJoxW6b6m9Nw=
X-Google-Smtp-Source: 
 AGHT+IG5QtsbWBTtMvCxIhoqEnfR3p382B1bTb4UdoM6BO6tHPbLKMHCpcoUiwmrRhOVEkJhPl8NGA==
X-Received: by 2002:a05:600c:1c21:b0:436:aaf:7eb9 with SMTP id
 5b1f17b1804b1-43668b5dfcbmr455433875e9.20.1736258420846;
        Tue, 07 Jan 2025 06:00:20 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43656b41904sm631546265e9.37.2025.01.07.06.00.20
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:20 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts
Date: Tue,  7 Jan 2025 05:59:49 -0800
Message-ID: <20250107140004.2732830-8-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5029; h=from:subject;
 bh=eGu2DApzmi0L8HX4HzBePl4/npJqwEuQr5WMXWg1eJE=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCcqbELGAiDo/CBPuG96WSJATpKpcd71rfXPgHH
 ggfUleOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnAAKCRBM4MiGSL8RyrTYEA
 CilodL7VB9trSaR9ZMPNgzrexz4ir1vzg/2ADD5C1X043YoygPX4ux2T4NWz7+r7sLzdhpPz8mZaRo
 Md4PFbbWr7QGlgzFvyu4umiru8D9EhqeM/vtXR5iLwkklDqNteUryRg/9JT8A+Efw1Ib9KMhjZzeor
 FoUKFqV59fkhbfqMrvCk4W88Ub9NK/MumeHVdKCxCbjI7cuMJ/hVYOibiXt6aros77yqNwjpgu9Wl+
 JWtEs6MNqpORNJYgiYGzffLYsmQ2qocp7JW0x2Hq9SSRmBsr7ziMetdUxpF4/mW8/xizx5xV3wbvgf
 f0aolfYU07gWwEgIJLSu/0fh+CcpJY6tDdbDJrt0HFJwpiDpzMElVBMbpAY9NE8kk4OHoyGNxxGTWZ
 wLQ5+/ZSCAc0+RFiJ1o4DH/mbq79dusO93rreHzy2Pu+VLTN8c/xKYWZHXQgQ2RdDymDRIKKj2eZQq
 3OFGMeM6TrZ8cXNGoMbdlLNHr+5mTwzrGww8eRN7456kn6eIC3kv0z9QWuBxLiEESWP83J91jNQxJh
 12jfdCN/dqmfHi0bRKGzSk1KxTgM+B+KOsQrBWIwvfzFkDfbvnSbMVIyw7z/hI2Q9XSzwV0rvsgR7L
 VvXYkxLRn9oef/q0Bc7sZC4x1LRE0RsPjpXnM6u9SC5E9r7thBFzpdam7szQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop, that is passed down from the caller of the slow path
function.  Use the RES_INIT_TIMEOUT macro to initialize it. The default
timeout value defined in the header (RES_DEF_TIMEOUT) is 0.5 seconds.

This macro will be used as a condition for waiting loops in the slow
path.  Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  8 +++++-
 kernel/locking/rqspinlock.c      | 46 +++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 5c2cd3097fb2..8ed266f4e70b 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -10,9 +10,15 @@
 #define __ASM_GENERIC_RQSPINLOCK_H
 
 #include <linux/types.h>
+#include <vdso/time64.h>
 
 struct qspinlock;
 
-extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+/*
+ * Default timeout for waiting loops is 0.5 seconds
+ */
+#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
+
+extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index fada0dca6f3b..815feb24d512 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -6,9 +6,11 @@
  * (C) Copyright 2013-2014,2018 Red Hat, Inc.
  * (C) Copyright 2015 Intel Corp.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
  *
  * Authors: Waiman Long <longman@redhat.com>
  *          Peter Zijlstra <peterz@infradead.org>
+ *          Kumar Kartikeya Dwivedi <memxor@gmail.com>
  */
 
 #include <linux/smp.h>
@@ -22,6 +24,7 @@
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
+#include <linux/timekeeping.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -68,6 +71,44 @@
 
 #include "mcs_spinlock.h"
 
+struct rqspinlock_timeout {
+	u64 timeout_end;
+	u64 duration;
+	u16 spin;
+};
+
+static noinline int check_timeout(struct rqspinlock_timeout *ts)
+{
+	u64 time = ktime_get_mono_fast_ns();
+
+	if (!ts->timeout_end) {
+		ts->timeout_end = time + ts->duration;
+		return 0;
+	}
+
+	if (time > ts->timeout_end)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+#define RES_CHECK_TIMEOUT(ts, ret)                    \
+	({                                            \
+		if (!((ts).spin++ & 0xffff))          \
+			(ret) = check_timeout(&(ts)); \
+		(ret);                                \
+	})
+
+/*
+ * Initialize the 'duration' member with the chosen timeout.
+ */
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+
+/*
+ * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
+ */
+#define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -97,14 +138,17 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
+	struct rqspinlock_timeout ts;
 	u32 old, tail;
 	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	RES_INIT_TIMEOUT(ts, timeout);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.

From patchwork Tue Jan  7 13:59:50 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928944
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C0DE1F2C38;
	Tue,  7 Jan 2025 14:00:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258429; cv=none;
 b=Y/Nv3C34Rp6S98Aj9qZg7iLvRTPk2Qncn6er56Tmv97qSMQIJsfduiZzWVSuTWJdwY14+vAJ1dHuGS4r4RpXxM+kHLEpSD3L4TuZ5GU3Pg1fhh2woHP9AGlDTxB/xPF2r7FNdjw9+a1eyV6KT0lOY4JEylncecmOjAAQha4HkmE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258429; c=relaxed/simple;
	bh=A6BHHwSDGZxil39BCj+tpzF00ye2bwkndcWT1dWcfPI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=XMCzfsz32TAyDTJQj65/4QxfN5eaiigGQK+N/YRWaa82YdmRK7StKYW/xDaUcXJ29zjKNZz4tDBEb9JZBs77/Bs1LIrP0jnW2k+0TaCmUNDMxi0deZHdYAHKshw5BAuRz7yk9PDecllJk3Uo/vsPAVFrie8NmOXKQANwMPzHLvU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=UYdijnxF; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="UYdijnxF"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-385eed29d17so7425923f8f.0;
        Tue, 07 Jan 2025 06:00:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258422; x=1736863222;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=jTnzMjvTX9PnixeDb/QJMwtogTLFCd+4yLgtlaH77/k=;
        b=UYdijnxF53y6eyCUrvdkwMUWJujLBPIM81AUOl1R4iHOMYGzfhb+vlpfZsGEMh1f93
         4+JQjuogC62QiYol9RC2SwwZb/EiG7TlllIVW9EFGv/NIX1Z1++8/SIVGvBfDRNrs7Ep
         eqNiVgQYq2bQBk5FtNPZx9Y2lyB2tQDyTiIfwdeTsjM/ZpAP8sy6Qnzo+Z0wCAmlI/59
         wJw7F2iqxD/ipc3KPyFMTJkbrRbqW2DYHN4jy037AYDMoSzaupXk6IyIBOrGcU3PJH5s
         z0QVPyy6luXzOzR5IFdpk0AQbyCBbrwhrrzIf7ZAtVUuWg1hrqEnBBEJHLr0TxAH2nBM
         1D2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258422; x=1736863222;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=jTnzMjvTX9PnixeDb/QJMwtogTLFCd+4yLgtlaH77/k=;
        b=pbGWVSsNMSB3S9cloUEWRwHQRKdJ5TZ1vRQWDBFbRi/Y0cRIyv8Xd1dCxx7kcVijsf
         4wkPe0ZFDkbkdRKaRxvaDFlw55tnWyivWv+eE+RD0xMTA3rx9F8VGW3Tvc3qhaTyCslt
         Vk0GYT9M+ml3tL4i64XsB5KK24zWcIVt8rfCQhVkEQcrnwRjsh5mnmqza36awtpyn1Ox
         hSfa+1D//2+AIg2amY1FXNKWQvKshijP35zz/EgWaPvx80cojLOTpJAOvMQFYiXFIsVx
         lmh033ZPFOKU/V/qCeVWBjz5ox5Nmg85O2MOK2m0hxRx8jTBRu+wNVNyqTGy3t69b3H0
         Wz5w==
X-Forwarded-Encrypted: i=1;
 AJvYcCUceFow4B8LOA3X8bLTcLnYbm+p2o6Qhlgh4z9QwWEcRQgWMV6Foxkgbmaf2RAL7wSVHbbf6J5/za8t3gI=@vger.kernel.org
X-Gm-Message-State: AOJu0YxZXrwptRzb2AFxQql224GndxPpLf82ZcXaqQ58qiwCdFwuj3zU
	5yyHs4ktagDCNlAaWub0sPm8I+2qSaXex/d1fNwGrWjet1GEYjgM+m/U7s5vcqwHfw==
X-Gm-Gg: ASbGncuzLmCpZGw7mZy4mWUWc741QGF1GR8A+vie4iDIGiSckYJwqJJTdcf0m2++64Y
	HQ5cZV9ZxQXFRhWLdoPfncaWx8y61Cm8234Fg6ngU2vQVXwUjxOKte1PjkROGqW7Ha51Xy58+qw
	dl+sA4o5v6WfB3y3np2lGfVtVi7Fjwd/7qxFBpycG2NJOd92tn/LmGMG348khZnLfVFgTzyrgxh
	XBEch3vrJLOJF+uu1CFzr3Bu/TJzeSmC0BDR7d1GJsN9N0=
X-Google-Smtp-Source: 
 AGHT+IHbtviyqxBl7c1L8MU/GwBDVrdaPEaDF6MwBXfI0O5tMtuU/Xq44c7pMTK+/rsL3T0ooXlakQ==
X-Received: by 2002:a5d:47c8:0:b0:385:f470:c2e1 with SMTP id
 ffacd0b85a97d-38a221e2f49mr51032448f8f.2.1736258422100;
        Tue, 07 Jan 2025 06:00:22 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:10::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c84840asm50400353f8f.61.2025.01.07.06.00.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:21 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from
 stalls
Date: Tue,  7 Jan 2025 05:59:50 -0800
Message-ID: <20250107140004.2732830-9-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4594; h=from:subject;
 bh=A6BHHwSDGZxil39BCj+tpzF00ye2bwkndcWT1dWcfPI=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCdyyu7Mfh83I9m8FVMYzvYUcbFbHtdjpbRNpaL
 FhDZMdOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8RylnnD/
 9mm9iqO78Iqz7RiZh5jUB1BxqwxazQw/vDBoz50U3d3TVVllnh4QiyH7MdiNxmM26YLDhltB9BDHrc
 Ppi99o8iKdVAS7M0uyE4m0pkX7PXa8C3sZxQdVPJNWTUDJR70mdqubhbj6je5KzC4G8bqnG9sj/2nn
 oeOdgP4Ot1VnAmya13OglO8DgPWdHebSD5iXzirtb6GDwOWpSTPALQjNqiQRjW4o/pJr5eJLk/dCv7
 bAbWoMKsv9aW8eRhCwaY9MQVmQSccLwV6v7c4SBz0QQxs/xqgHvb/vd9jOaLJwdOTebhX+MX4ULiMw
 vSTNHvMtInrUCifywL4LbyBcr4TfowXAsUJJQRhjC7PC/yqLH5D1HUXWV+pq38KyKxlpCC4CPy+gq3
 yTdLnPxZI1Z6YxWnJK5QilrMiMR5qN3achUpq64zcZ76vOQEtH5Gsp3dcsTqrLkK1bFe0hrTYhXWqc
 RktOzqC6O83gXVpDdtQfoEN19/oM3ava4k+XmPYa2jGyzjSL+/qc9YOSV+UZ/0kCoyWQcsPq/2Th1W
 RmxWzFEFPBzRlHYQPu69mhloAozAy8gvVzRNNPTinXZB3xG81e7OY5+q5ZUju37g+vn6X/KlQPdkBB
 lgIybkmfPXIw9BXxE0b1PbVNKS2CKt5jrbFu/kMCSbpoJlQbj2xEhAo3Te3A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

The pending bit is used to avoid queueing in case the lock is
uncontended, and has demonstrated benefits for the 2 contender scenario,
esp. on x86. In case the pending bit is acquired and we wait for the
locked bit to disappear, we may get stuck due to the lock owner not
making progress. Hence, this waiting loop must be protected with a
timeout check.

To perform a graceful recovery once we decide to abort our lock
acquisition attempt in this case, we must unset the pending bit since we
own it. All waiters undoing their changes and exiting gracefully allows
the lock word to be restored to the unlocked state once all participants
(owner, waiters) have been recovered, and the lock remains usable.
Hence, set the pending bit back to zero before returning to the caller.

Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
event statistics.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h  |  2 +-
 kernel/locking/lock_events_list.h |  5 +++++
 kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 8ed266f4e70b..5c996a82e75f 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -19,6 +19,6 @@ struct qspinlock;
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..c5286249994d 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
 LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
 #endif /* CONFIG_QUEUED_SPINLOCKS */
 
+/*
+ * Locking events for Resilient Queued Spin Lock
+ */
+LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
+
 /*
  * Locking events for rwsem
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 815feb24d512..dd305573db13 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -138,12 +138,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
+int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	struct rqspinlock_timeout ts;
+	int idx, ret = 0;
 	u32 old, tail;
-	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
@@ -201,8 +201,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		smp_cond_load_acquire(&lock->locked, !VAL);
+	if (val & _Q_LOCKED_MASK) {
+		RES_RESET_TIMEOUT(ts);
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+	}
+
+	if (ret) {
+		/*
+		 * We waited for the locked bit to go back to 0, as the pending
+		 * waiter, but timed out. We need to clear the pending bit since
+		 * we own it. Once a stuck owner has been recovered, the lock
+		 * must be restored to a valid state, hence removing the pending
+		 * bit is necessary.
+		 *
+		 * *,1,* -> *,0,*
+		 */
+		clear_pending(lock);
+		lockevent_inc(rqspinlock_lock_timeout);
+		return ret;
+	}
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -211,7 +228,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 	clear_pending_set_locked(lock);
 	lockevent_inc(lock_pending);
-	return;
+	return 0;
 
 	/*
 	 * End of pending bit optimistic spinning and beginning of MCS
@@ -362,5 +379,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
+	return 0;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Tue Jan  7 13:59:51 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928945
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CCFE1F1937;
	Tue,  7 Jan 2025 14:00:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258431; cv=none;
 b=Sm8v70eIh7JYy8JaZYR3Hb3+lh9tQ5Swo5FSwtDaCqQni91plnXNgQBXfuVYp5v5S+ELJcp1hnmHyLNL6Y8/XSnMZP2Y+IPeizKa/X1mTj1j2pl0MExCxLr3BhvJsEQU5fwHasKF3V6OrBoL5+J12XUBTooplyD/NFkOyXiJYEI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258431; c=relaxed/simple;
	bh=IgfYm2OWQRpCBxLPzMhHBs/ZxVNXd/QFwTtPV8nytDU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=OMlccoljnAoqgX9iwm+D8bkey7YnZJAZzVON4ZgR82t3MJiKYkXo+++So2imUO/eUX9jKcvpDtTzThXlre3ad0lacHHZ0nom9OQvJ0bAhzYE1tsXKrOUuSDqDpmKn8A2Scp6QQT9dSPtrXK5yHPYBTZBQL4HH9BF7J1lb+k6bbI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Y6BbV/rT; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Y6BbV/rT"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4361c705434so112615825e9.3;
        Tue, 07 Jan 2025 06:00:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258423; x=1736863223;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=XbshwL8dJ3BJ5N3j5Um9HMMlyp+LQfdllk6w8wYTP3Y=;
        b=Y6BbV/rTWjULJjxt7NmJSAHZHn0I3ktBxpcYXlBIxbmrksMaeR0dy90w5h1BgIUKEP
         zqjivcrglnuO41SWxEBkId6uAM2G0vxPZpWGGlYOtWmDzwRwTbNIBUkQmwgKFk6H6CHt
         DtO8jv94dLFsuYbcAxE+xmzcrPxzJn2ItFOHK5j4ZVsEnSO0tYR8o7G+6tOPibwz8MsA
         j2Vhb/uN0dF1hJK9DXLV4zJ5e5EKOfXr6+8WOfKKEycoOR6TQsBoSsLCpDAnyCxCGxCL
         SCxpklVBNS/NWb/rwPTwaq4+oJXOHMKxwYBsDrODeg2xgBVTPo2nikyQ60+JtNxjG/HZ
         0KrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258423; x=1736863223;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=XbshwL8dJ3BJ5N3j5Um9HMMlyp+LQfdllk6w8wYTP3Y=;
        b=UUX95Xa9XB/saqk4a1nmUFf+Q530SZv+bnHu66lWCJL+zaE0w0ig1Q7e4bk4nkSwhd
         G3aO3o4gEG80kS8Mv/moTG6NDVuqeB7ln9fVvisTUr1nX+3wHNj26sWfX7E5bNbkzTvp
         PXRNy4w94Fwf/VaHe8Ki2UVQcbawM3Vr0WNf115b35Djwdblp2ajJ3A8IK6ZJoffZxi4
         eel39nn4OXJdKQWOxmaFrKLVi5duOagGTjnVR41Au5Hk3P8D68jjGtH5zWG8TI27aH8p
         yVq7Y6KGt3JegBGC3RLybtyWD7EIg83xQofaEx8HiEURqt9u73+65vcx8K1HXjbeBZ9t
         OGpw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXrQJsM1Mhaz4l3QoBZnWKWmmTe577q6r/3eP0QKMAqquRKHBcFHuzZKY/ZDVKrA7+s4O4gUId8j5I+stg=@vger.kernel.org
X-Gm-Message-State: AOJu0YxJG4LFKSYnMB65/XzYFmOcRBKIvWDLHYkpfFTJMn52OgmeOcww
	mx66yFJeRpZhb6px43610xWxay3bhaub0gChVj6ccqJwA7075O7r7qeFdmUIiC30kg==
X-Gm-Gg: ASbGncuejaTgcosGjT0U0HyEc8DRzR5rmiyyGye4Ij6Befbc9qxdbMYzr0sX94OZq2C
	Ec1dCUihOO/qKuGmJQa2qDOty0ubd6+zHL8p+B19LXPkDPToVSiMZuQ3CZlMbx8B03gYgER8Tyn
	4jBIPiS8ZHjGXP+HtMmY38sgOL9pB55VdqniekQcu0uQT+kBIimYllPusRPU7+X8xP6kfHCK8FG
	a4HcEmERFJk6w6wOjhJJH7m8wAKz/u5jns+Ohf6lZrgKQ==
X-Google-Smtp-Source: 
 AGHT+IG+hHRb47WYnzbA+zvjAtopAH2+TuOZ00bIWmSuuatVt3XGRGJ1JYUm4f0jbTLJVdmXoWz8yQ==
X-Received: by 2002:a05:600c:3b02:b0:434:f609:1af7 with SMTP id
 5b1f17b1804b1-43668547462mr522010605e9.4.1736258423307;
        Tue, 07 Jan 2025 06:00:23 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:c::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43656b00cf6sm640982555e9.10.2025.01.07.06.00.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:22 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue from
 stalls
Date: Tue,  7 Jan 2025 05:59:51 -0800
Message-ID: <20250107140004.2732830-10-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=7065; h=from:subject;
 bh=IgfYm2OWQRpCBxLPzMhHBs/ZxVNXd/QFwTtPV8nytDU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCdQj2N8RblN49pYrsiE/JahnFXNkv8S5Ij976P
 XZAaDJKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8RynOcD/
 9zeuGBQEedmUGc8xb41HwiTv8aPA9iiWIgPc+Jv7Dxoi2Aqw4t+E80VgXw3+4TR8Hg3lozrKaxwBpb
 LnVOC3YceN0QluQP0yK3/e3c+FDdrm0/5/LJsxD+Yx9+mDRf2aYyl77JqntmU5qNRpOW1uwrc38y2f
 qC2cLzCSvBZmw6uoELukyLoxW7mXenJuDOhhSfFl4oAgcjHNWushiPfNPKal0OasnkZu2hwZqssPL0
 4IBSaT4/l5UuCw/DLr+sBHQCazF0fDFrZnrqpCyTgu3hhbk99pokdUTC90lM/g9Vs52YZUWi+U/GrR
 9CcrRwYhbjeirYBbBa3ARvBCXD866XXznTgKETqk7EVzj7zbv4HS5TPl5c85q5BGhSyd5JGtDTellw
 GXhfkdg1zBYhh70ydFJP5kI8UEFu7F7aAFs4YzlYkjerwO+lr6ChLZfjOJ5PvQYlENcqWQbtrYMyA0
 FCUml0pNtQzOjt0quTEOQ4/2/GJ6a1ykn64o4S5+eTrhBlcGgf5XAybvUNTfg4XOon+bfyihQsXcbV
 bTQjL8edL9qW8VVWrBAX3/dEUB/6pQ5+a9+e2N56B5b75QlPF9clziZ6K9RLXWH+h1yGFa4WFqoBlC
 ibmPyapzTg05xIeL5LaikxaadOQBF8lW07JyBh4pTGM6VPP6uWoGYeV06TvQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Implement the wait queue cleanup algorithm for rqspinlock. There are
three forms of waiters in the original queued spin lock algorithm. The
first is the waiter which acquires the pending bit and spins on the lock
word without forming a wait queue. The second is the head waiter that is
the first waiter heading the wait queue. The third form is of all the
non-head waiters queued behind the head, waiting to be signalled through
their MCS node to overtake the responsibility of the head.

In this commit, we are concerned with the second and third kind. First,
we augment the waiting loop of the head of the wait queue with a
timeout. When this timeout happens, all waiters part of the wait queue
will abort their lock acquisition attempts. This happens in three steps.
First, the head breaks out of its loop waiting for pending and locked
bits to turn to 0, and non-head waiters break out of their MCS node spin
(more on that later). Next, every waiter (head or non-head) attempts to
check whether they are also the tail waiter, in such a case they attempt
to zero out the tail word and allow a new queue to be built up for this
lock. If they succeed, they have no one to signal next in the queue to
stop spinning. Otherwise, they signal the MCS node of the next waiter to
break out of its spin and try resetting the tail word back to 0. This
goes on until the tail waiter is found. In case of races, the new tail
will be responsible for performing the same task, as the old tail will
then fail to reset the tail word and wait for its next pointer to be
updated before it signals the new tail to do the same.

Lastly, all of these waiters release the rqnode and return to the
caller. This patch underscores the point that rqspinlock's timeout does
not apply to each waiter individually, and cannot be relied upon as an
upper bound. It is possible for the rqspinlock waiters to return early
from a failed lock acquisition attempt as soon as stalls are detected.

The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
race with a concurrent xchg and a non-head waiter linking its MCS node
to the head's MCS node through 'prev->next' assignment.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++---
 kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 3 deletions(-)
 create mode 100644 kernel/locking/rqspinlock.h

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index dd305573db13..f712fe4b1f38 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -77,6 +77,8 @@ struct rqspinlock_timeout {
 	u16 spin;
 };
 
+#define RES_TIMEOUT_VAL	2
+
 static noinline int check_timeout(struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
+		int val;
+
 		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		arch_mcs_spin_lock_contended(&node->locked);
+		val = arch_mcs_spin_lock_contended(&node->locked);
+		if (val == RES_TIMEOUT_VAL) {
+			ret = -EDEADLK;
+			goto waitq_timeout;
+		}
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
 	 */
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+	RES_RESET_TIMEOUT(ts);
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+				       RES_CHECK_TIMEOUT(ts, ret));
+
+waitq_timeout:
+	if (ret) {
+		/*
+		 * If the tail is still pointing to us, then we are the final waiter,
+		 * and are responsible for resetting the tail back to 0. Otherwise, if
+		 * the cmpxchg operation fails, we signal the next waiter to take exit
+		 * and try the same. For a waiter with tail node 'n':
+		 *
+		 * n,*,* -> 0,*,*
+		 *
+		 * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is
+		 * possible locked/pending bits keep changing and we see failures even
+		 * when we remain the head of wait queue. However, eventually, for the
+		 * case without corruption, pending bit owner will unset the pending
+		 * bit, and new waiters will queue behind us. This will leave the lock
+		 * owner in charge, and it will eventually either set locked bit to 0,
+		 * or leave it as 1, allowing us to make progress.
+		 */
+		if (!try_cmpxchg_tail(lock, tail, 0)) {
+			next = smp_cond_load_relaxed(&node->next, VAL);
+			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
+		}
+		lockevent_inc(rqspinlock_lock_timeout);
+		goto release;
+	}
 
 	/*
 	 * claim the lock:
@@ -379,6 +415,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h
new file mode 100644
index 000000000000..3cec3a0f2d7e
--- /dev/null
+++ b/kernel/locking/rqspinlock.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock defines
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __LINUX_RQSPINLOCK_H
+#define __LINUX_RQSPINLOCK_H
+
+#include "qspinlock.h"
+
+/*
+ * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value
+ * @lock: Pointer to queued spinlock structure
+ * @tail: The tail to compare against
+ * @new_tail: The new queue tail code word
+ * Return: Bool to indicate whether the cmpxchg operation succeeded
+ *
+ * This is used by the head of the wait queue to clean up the queue.
+ * Provides relaxed ordering, since observers only rely on initialized
+ * state of the node which was made visible through the xchg_tail operation,
+ * i.e. through the smp_wmb preceding xchg_tail.
+ *
+ * We avoid using 16-bit cmpxchg, which is not available on all architectures.
+ */
+static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		/*
+		 * Is the tail part we compare to already stale? Fail.
+		 */
+		if ((old & _Q_TAIL_MASK) != tail)
+			return false;
+		/*
+		 * Encode latest locked/pending state for new tail.
+		 */
+		new = (old & _Q_LOCKED_PENDING_MASK) | new_tail;
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return true;
+}
+
+#endif /* __LINUX_RQSPINLOCK_H */

From patchwork Tue Jan  7 13:59:52 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928947
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB74E1F2C4E;
	Tue,  7 Jan 2025 14:00:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258434; cv=none;
 b=iJqKg6FD6GF2EzXllMzUyCx5UX+8kTOd6XB31+p8Bu59braanAB9MqrlL37kPbXvY6T3fCQhM/ClwjOMX7//FnXHxippZk+vFe15f30yEaLu6gpers8jhT7eeFXOkpgxk16oNYt6pJMGxo+0TCPGb9sTQyUKcss10Q4guwVnhnw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258434; c=relaxed/simple;
	bh=XrkprQgh2R+JBqzrcqVF0TqES9iFxfnGeXRzHPiyDoE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=m5TSEMiFl0EvqdO2fx86zGXbwT0JyyIUxqGTTL/RadyzJJAyWe4/quG7RiUjqi/9HnO6S293B+WSH8ZafrsymJr8E+22sA6eWvERFc6+wo5BmWAJyVYtCiseprfpovInyHzxTvTpT6E5FEAMGKKhtqKUKd2yDhxXRG6npGFUiFw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=d0k5pcLv; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="d0k5pcLv"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-4361c705434so112616145e9.3;
        Tue, 07 Jan 2025 06:00:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258425; x=1736863225;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=kLa73evp3YqHV8fdaIkN/k4tmxXrGX9qEQrGfZnRvGM=;
        b=d0k5pcLvcZWyuemI/8/wLiS1WFS6JH2eldOlEGTKumNZ41/WIR1zqTjmP89FYHuHI5
         PZHVN4JLFJbrpVWWdGKFkNzrii/sulaKYZThdpRAeIx1YQXvAX2uWZhdUEc8mIcHRrw8
         ah5trl83HGLU/XNk4PIoDzQmvx9U477X5EvSC/dVyGEtNu9M2luLfVH38k2uefPb3tw9
         p0Asq/4hgl9qXUNLMlJC6uOiNu7v+5u6Y9e+gd81+MuRhVXPTTRFAYD5Axj6WLHw7o20
         Nhrml4lAl0c4MlHDrLgLbxiSUAygStByXvBaQxy8Via2EbxEJrAUHMO/vFczFVGtRPgN
         cT1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258425; x=1736863225;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=kLa73evp3YqHV8fdaIkN/k4tmxXrGX9qEQrGfZnRvGM=;
        b=Ntt+MUsGaaR3h9uFWXANVoVPlGqy5njgEpUq+mrnJsCdVLpMMAB6Z5Yl7iVGgRHNaJ
         19F6atuOkFdtc/APEGxzEeyqjAWLrQC+POWSvjLSCmUc0AAFTB9w5xLc8V9CNmBk+dAE
         RQvtvEVcbsD4JKzECJ8lWdan2wl9jKrFFm0xTMVTR1tsvsmmwsHqg2tQzaoRaknM73Oj
         75D0nEmq5ebRrFd3HUrlvUyVdcF3LXI+NJsqWqgiSfePEJ1Vn1tKiioiNVuC4lXeieBS
         0gWSqDdjoyExd0yu93gjDDR++p478c49CVzDeqe9qeiFKfO+4W03EXzy6EcrhFucjkKj
         FEtA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVckP6GKlT6uhmWgcmWbZDbEfcdKmzTPdVK18ksp41MjsVCgX0e9Fioyllht6aqNewI0B5Az7we/AxMJP4=@vger.kernel.org
X-Gm-Message-State: AOJu0YydAgjxoUNPZ3h0KUsMFJr08RpYcr7CY5CXjC2UmEpFR20h+vtx
	r2B86j/40G++75J8DDfhlOr99jJ5w5V4k+ubD0648z+7Yw/MWbo8dOuwU0NsQcypqw==
X-Gm-Gg: ASbGncvU2IbpBbQ+5HZlptbRjnTPJMRCSQApjjZRZ0/KvZkZjWmlLQ8lrX0i6x/ZVba
	DJxjdUcbizbY8jZrYtsIw5AnJh37jzM7cGoJQ2arlPsoRozKDHQJ8kWrnpmzOVMDribqH0q2+DS
	dgGvk8uDDFHv6PNbUuY69rMgQRvtFwnpXAQm9GPrM9DKMCxsmz24c2Mm/TkyezG1C3cnv27ibyk
	EVLvkHVkxn30ftkB1jgi1HYgDXfFkB9bbchoOiqdDr/57E=
X-Google-Smtp-Source: 
 AGHT+IGS0VWZp035yBJfzAKd95k9hu2+B3ywnAF9DnzUw7iLW/9Um0RGqKnCmUoqdeXVFAwkV+LMgw==
X-Received: by 2002:a05:600c:4588:b0:42c:b16e:7a22 with SMTP id
 5b1f17b1804b1-43668643348mr552295875e9.12.1736258424852;
        Tue, 07 Jan 2025 06:00:24 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4366128a353sm599016865e9.42.2025.01.07.06.00.24
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:24 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 10/22] rqspinlock: Protect waiters in trylock
 fallback from stalls
Date: Tue,  7 Jan 2025 05:59:52 -0800
Message-ID: <20250107140004.2732830-11-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1513; h=from:subject;
 bh=XrkprQgh2R+JBqzrcqVF0TqES9iFxfnGeXRzHPiyDoE=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCdmFm8k8mGBFpl9qTo6AHOVKqkYyiWkymi9Tha
 aIiJeaqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8RypaVEA
 CIq0Es2lfW5X5IFAiLLKCkSli9baeHSPnI0O/V3GFsTdxcmB70h1IC1Y62z0wwacIwSKZNq0zggIel
 3/yfHXI5rr/nyNEcxT37lrixMptQeQdO1CxFoIEILxF6e99h9ClaizZVnUOtvj6LmVYu0OuOAnlzXz
 PROf1tOuT25trNU0sdbCZ6a3cv/L/U37HPo5LLycwyr7HUZiHbE3wuCb50QtJ0xjU+0GhvFB7WfA3N
 RQs4TQrdxK5KLVRWHifYTwcTIjezo1DOWy9L2AQJo0Mf4hr2uyFknV8gcppH7RckV4LAvv6Uk1njoc
 +dVSIJ3aj4GoJY0QmRTDkRYxFToU1Eda+OUU7lKNxiQSefN347Lc5x+zAMeehnx4lZt74PCttggg2L
 MQ1wOVW+iG1atf34xgo4Y8Gl9rpaTP+QXv294T5Ik5EKmuK/AYB9WXYrEWBuuTqigHKgQQ0s6MwPAL
 yxWtnG98DbhStLLYwKgnfbJYgCNt1PWk3rrV2yjMcIsc947KNUZSQcDnX6pgKEdw9S/3Vyr0yBy9IT
 ZUgvE/VtcyX3qTPUJZ2Xc3PMTgRMWeKmrJS16rIDpsODvPJZ9CmQs7Jj3pWeYZ2Hji2ANmrAtl4YCL
 HRlPm/w1ZreyQ7dEtfyazm3sVIY25pgw0DZFnetcg/AVAWLn130FwyukkWBQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

When we run out of maximum rqnodes, the original queued spin lock slow
path falls back to a try lock. In such a case, we are again susceptible
to stalls in case the lock owner fails to make progress. We use the
timeout as a fallback to break out of this loop and return to the
caller. This is a fallback for an extreme edge case, when on the same
CPU we run out of all 4 qnodes. When could this happen? We are in slow
path in task context, we get interrupted by an IRQ, which while in the
slow path gets interrupted by an NMI, whcih in the slow path gets
another nested NMI, which enters the slow path. All of the interruptions
happen after node->count++.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index f712fe4b1f38..b63f92bd43b1 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -255,8 +255,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
-		while (!queued_spin_trylock(lock))
+		RES_RESET_TIMEOUT(ts);
+		while (!queued_spin_trylock(lock)) {
+			if (RES_CHECK_TIMEOUT(ts, ret)) {
+				lockevent_inc(rqspinlock_lock_timeout);
+				break;
+			}
 			cpu_relax();
+		}
 		goto release;
 	}
 

From patchwork Tue Jan  7 13:59:53 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928948
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDD551F37BD;
	Tue,  7 Jan 2025 14:00:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258436; cv=none;
 b=jmoIicoDDv4/uouqGGkx48ZFso95E4RjsxIvL6FSAmn6+q03aPYWk/tpCKxvN3/W2p6+aOSLtgFHFUcye5sQTcYPQsmsZCNPOMoVS+7+Mhgrt2hiTS9Fp3d8wpamuGqoQguefgNHb8lxgZhNQ/Vzpg7yXu7VstpsJ1HXHKqbM/c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258436; c=relaxed/simple;
	bh=bNf0S7LL2XvS5mVW9zMOR/LJpbrZez7d2KypRIUumaM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=WDcWvaXQIQcG3b6L4XH+LRBH+KTFlQLGNDzVGcBAMQkigxPeCgpURgzQAp0BYxHaQQNbKw9w6GnbVHDZutHR3GPItz6pUI9KPg23o9Zrfy5seU2g+KQSViHZnjhsygPY9WAW0DkOrKcSbtGTUtvVaRYs8aRHj+1M4H4Xzk5V8SI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=EeZyoicZ; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="EeZyoicZ"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-385f07cd1a4so10218650f8f.1;
        Tue, 07 Jan 2025 06:00:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258427; x=1736863227;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Jb4OSdpYuP2S/Nvn1Sz/n8DxUFgPznMtuFJtw7K3PyU=;
        b=EeZyoicZwd7XTmDa5ZRvy/TfyN+M5OjYPjuk7TYR+UHvaDCKckyoL5aSZ46CzXauHj
         ebQ1A5311WRXw7J5fPFWT3GkCtgEm1lW5vQojI2QsoyTzsQtc7zVjuah65r/NMnCb9JV
         rN6cmVrxapzIVg1rWLRwhReuMRvZeXTETu7S4hq35SayMO9mmFrwZ6Z6ZNjxclbJWmQn
         uKRqCqDUUZJ/YK7Yyb0D1dEGxxtnQwUiti5f4vTBHV0uiK67kulbDXBrzkajKVJgapD8
         QdTevKJNBFlqsVRMbuj4dE5U9TFZQTMEf2shvjwM5mS1MkKUYklmM/RnAJH4tCm4OwPy
         Lbyg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258427; x=1736863227;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Jb4OSdpYuP2S/Nvn1Sz/n8DxUFgPznMtuFJtw7K3PyU=;
        b=IyIb19EG8L6BoU+9RsRucHJyU1Eg3I7tCL09HOhWz90rGVEsOAkljviLPIi/M+duqw
         ueoTyPTH6s+BNfs5UBi62rGwS8MCQTSOGV7JLTfQ6uRZxuHIuUhD5gCuIBW5BjTqOwPu
         dvhwA/J11l8aqsdlHCJIpq2YW678TPB8l1GMAuyJHP7OsL/tfjBr8LE3pmsuUqt/EEKd
         DlynzZO85WO/5XlhDhZU37G93ekYAs3+PgLqQtL1bsDrMSkp5PW8+vsaW9re0Ha1IkUN
         x9jUH1Uz0acykIesy05fcWgYPY9DHQE1BPmrqTNrXbjIcaVIyPNkJxI9EcEk6U65dkzF
         53XA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUnEvxtFUBi8Whgw2V51S3SO8V/huTc9yvgmpTCPJqcAx5fTAi0tcFV+T/qJ9eyG/LABdSAB/Krb8xeBfM=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxgpl0YAseBFw6QclTTy0RpJvDQXze8dPcsMnb+bbAGk31DmPR0
	L44UeegdVapRr2wqdP1LiG7mOa6tokipixjYu3Co16bbkvheheSZt8xjJONjQpaP0g==
X-Gm-Gg: ASbGncuXiI39UdUNwRsTJu0EExH5MzbxKSoyGqRaMJjzC4CgLCuYDf0zk9UGgxiMGtt
	8HgHuqCnYSykpMpI1GoD9KgwC3owoJBah0PmgwGPKYpXVMAuMUkXzVh+E3mogb6xqAfpMdTELhk
	LALBEN2P4YOplPwVOASYBLPuQd08cHwEtuJWVnr9/pnZXO67lIPW2GiepvAPRvTBbwIkVk6IgFV
	fiQ8T2XlyoeQd2L4dvsgRti8uwweyQI7V56M8ppvxoqWvU=
X-Google-Smtp-Source: 
 AGHT+IHQu9PuFslO/cTslv03rwOEAoeYf2hfBbjnFKF/M9BtmQ/uEo66sXGtTo7qATbKLKNQ/EIB1g==
X-Received: by 2002:a05:6000:18a8:b0:385:f7d2:7e29 with SMTP id
 ffacd0b85a97d-38a221ea539mr51808481f8f.15.1736258426389;
        Tue, 07 Jan 2025 06:00:26 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c828f5fsm51087383f8f.8.2025.01.07.06.00.25
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:25 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and
 recovery
Date: Tue,  7 Jan 2025 05:59:53 -0800
Message-ID: <20250107140004.2732830-12-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=13520; h=from:subject;
 bh=bNf0S7LL2XvS5mVW9zMOR/LJpbrZez7d2KypRIUumaM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCdYUutWpw5wTtPT1XjjEZ4TIaA+HscecTSb1/+
 heo1rcaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8Ryqt+D/
 9Sx5zdp/3fFEv3erhlKTALKqzWyKk52pS0yH2p6V+XLc156S4Gza0LrKVO6BKynNXB8lGCtIlbEjt3
 a6aHZJUPBPHIwiD5l8DhpW4IAVkkzINgI9bmCjVRq42gI4S0rE2rVddZrWf6RKRYo6fat6VQIGrODU
 eKABqaRe0BEK83I/6ZZYB1OrzRUKcDlSsw6zqyXuBxmbJocgWJ8pmfr9FxmJ0iusiJh14rVLsmcucs
 +CETEAMzfNfxeUZIZOJ7GccpQJuQRMCM7C23IE+pjU0hKCJU3s6AcxlYHu0dsFpikMyK91h4SpJk0B
 6zm98nyftxRgYyGA7yJ1bwyDEDwve/JdE9XW8198PYvgbjvYWOaxVUSL2ZgFPL8lnTfCJ7XyK4QlFa
 /fXllaoCi0dvmnU7e68b7ZBarDYSdPiG14bKHrE0mk4CN36mskJZE2ep3F5sS47mXS0rS5nY5r9rz9
 mzgEptIJiT2JXuG8BiOZNp3D7s1fSVBFQMDJg6T3bDpNpdrG0TOyBf/3L8PwfeWOxBE7+kEK28/KjF
 UMidbo/15Z2e5aqy1KzdLphE+rgqacFIDffDXi+kTe60DMWNYLDJ4GmTC+T+0cK3GOpkgEr5t/bjDO
 QH5H9/IaHPV1avIpmrFbsM4B5OQ6ytEXFyN6XX/Wrnz1mgZrrinmRERTLs9g==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

While the timeout logic provides guarantees for the waiter's forward
progress, the time until a stalling waiter unblocks can still be long.
The default timeout of 1/2 sec can be excessively long for some use
cases.  Additionally, custom timeouts may exacerbate recovery time.

Introduce logic to detect common cases of deadlocks and perform quicker
recovery. This is done by dividing the time from entry into the locking
slow path until the timeout into intervals of 1 ms. Then, after each
interval elapses, deadlock detection is performed, while also polling
the lock word to ensure we can quickly break out of the detection logic
and proceed with lock acquisition.

A 'held_locks' table is maintained per-CPU where the entry at the bottom
denotes a lock being waited for or already taken. Entries coming before
it denote locks that are already held. The current CPU's table can thus
be looked at to detect AA deadlocks. The tables from other CPUs can be
looked at to discover ABBA situations. Finally, when a matching entry
for the lock being taken on the current CPU is found on some other CPU,
a deadlock situation is detected. This function can take a long time,
therefore the lock word is constantly polled in each loop iteration to
ensure we can preempt detection and proceed with lock acquisition, using
the is_lock_released check.

We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
deadlock checks immediately to perform faster recovery.

Note: Extending lock word size by 4 bytes to record owner CPU can allow
faster detection for ABBA. It is typically the owner which participates
in a ABBA situation. However, to keep compatibility with existing lock
words in the kernel (struct qspinlock), and given deadlocks are a rare
event triggered by bugs, we choose to favor compatibility over faster
detection.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  56 +++++++++-
 kernel/locking/rqspinlock.c      | 178 ++++++++++++++++++++++++++++---
 2 files changed, 220 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 5c996a82e75f..c7e33ccc57a6 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -11,14 +11,68 @@
 
 #include <linux/types.h>
 #include <vdso/time64.h>
+#include <linux/percpu.h>
 
 struct qspinlock;
 
+extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+#define RES_NR_HELD 32
+
+struct rqspinlock_held {
+	int cnt;
+	void *locks[RES_NR_HELD];
+};
+
+DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static __always_inline void grab_held_lock_entry(void *lock)
+{
+	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
+
+	if (unlikely(cnt > RES_NR_HELD)) {
+		/* Still keep the inc so we decrement later. */
+		return;
+	}
+
+	/*
+	 * Implied compiler barrier in per-CPU operations; otherwise we can have
+	 * the compiler reorder inc with write to table, allowing interrupts to
+	 * overwrite and erase our write to the table (as on interrupt exit it
+	 * will be reset to NULL).
+	 */
+	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
+}
+
+/*
+ * It is possible to run into misdetection scenarios of AA deadlocks on the same
+ * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
+ * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
+ * logic to preserve right entries in the table would be to walk the array of
+ * held locks and swap and clear out-of-order entries, but that's too
+ * complicated and we don't have a compelling use case for out of order unlocking.
+ *
+ * Therefore, we simply don't support such cases and keep the logic simple here.
+ */
+static __always_inline void release_held_lock_entry(void)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto dec;
+	smp_store_release(&rqh->locks[rqh->cnt - 1], NULL);
+	/*
+	 * Overwrite of NULL should appear before our decrement of the count to
+	 * other CPUs, otherwise we have the issue of a stale non-NULL entry being
+	 * visible in the array, leading to misdetection during deadlock detection.
+	 */
+dec:
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b63f92bd43b1..b7c86127d288 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -30,6 +30,7 @@
  * Include queued spinlock definitions and statistics code
  */
 #include "qspinlock.h"
+#include "rqspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -74,16 +75,141 @@
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
+	u64 cur;
 	u16 spin;
 };
 
 #define RES_TIMEOUT_VAL	2
 
-static noinline int check_timeout(struct rqspinlock_timeout *ts)
+DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
+{
+	if (!(atomic_read_acquire(&lock->val) & (mask)))
+		return true;
+	return false;
+}
+
+static noinline int check_deadlock_AA(struct qspinlock *lock, u32 mask,
+				      struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int cnt = min(RES_NR_HELD, rqh->cnt);
+
+	/*
+	 * Return an error if we hold the lock we are attempting to acquire.
+	 * We'll iterate over max 32 locks; no need to do is_lock_released.
+	 */
+	for (int i = 0; i < cnt - 1; i++) {
+		if (rqh->locks[i] == lock)
+			return -EDEADLK;
+	}
+	return 0;
+}
+
+static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
+					struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int rqh_cnt = min(RES_NR_HELD, rqh->cnt);
+	void *remote_lock;
+	int cpu;
+
+	/*
+	 * Find the CPU holding the lock that we want to acquire. If there is a
+	 * deadlock scenario, we will read a stable set on the remote CPU and
+	 * find the target. This would be a constant time operation instead of
+	 * O(NR_CPUS) if we could determine the owning CPU from a lock value, but
+	 * that requires increasing the size of the lock word.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu);
+		int real_cnt = READ_ONCE(rqh_cpu->cnt);
+		int cnt = min(RES_NR_HELD, real_cnt);
+
+		/*
+		 * Let's ensure to break out of this loop if the lock is available for
+		 * us to potentially acquire.
+		 */
+		if (is_lock_released(lock, mask, ts))
+			return 0;
+
+		/*
+		 * Skip ourselves, and CPUs whose count is less than 2, as they need at
+		 * least one held lock and one acquisition attempt (reflected as top
+		 * most entry) to participate in an ABBA deadlock.
+		 *
+		 * If cnt is more than RES_NR_HELD, it means the current lock being
+		 * acquired won't appear in the table, and other locks in the table are
+		 * already held, so we can't determine ABBA.
+		 */
+		if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD)
+			continue;
+
+		/*
+		 * Obtain the entry at the top, this corresponds to the lock the
+		 * remote CPU is attempting to acquire in a deadlock situation,
+		 * and would be one of the locks we hold on the current CPU.
+		 */
+		remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]);
+		/*
+		 * If it is NULL, we've raced and cannot determine a deadlock
+		 * conclusively, skip this CPU.
+		 */
+		if (!remote_lock)
+			continue;
+		/*
+		 * Find if the lock we're attempting to acquire is held by this CPU.
+		 * Don't consider the topmost entry, as that must be the latest lock
+		 * being held or acquired.  For a deadlock, the target CPU must also
+		 * attempt to acquire a lock we hold, so for this search only 'cnt - 1'
+		 * entries are important.
+		 */
+		for (int i = 0; i < cnt - 1; i++) {
+			if (READ_ONCE(rqh_cpu->locks[i]) != lock)
+				continue;
+			/*
+			 * We found our lock as held on the remote CPU.  Is the
+			 * acquisition attempt on the remote CPU for a lock held
+			 * by us?  If so, we have a deadlock situation, and need
+			 * to recover.
+			 */
+			for (int i = 0; i < rqh_cnt - 1; i++) {
+				if (rqh->locks[i] == remote_lock)
+					return -EDEADLK;
+			}
+			/*
+			 * Inconclusive; retry again later.
+			 */
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static noinline int check_deadlock(struct qspinlock *lock, u32 mask,
+				   struct rqspinlock_timeout *ts)
+{
+	int ret;
+
+	ret = check_deadlock_AA(lock, mask, ts);
+	if (ret)
+		return ret;
+	ret = check_deadlock_ABBA(lock, mask, ts);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static noinline int check_timeout(struct qspinlock *lock, u32 mask,
+				  struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
+	u64 prev = ts->cur;
 
 	if (!ts->timeout_end) {
+		ts->cur = time;
 		ts->timeout_end = time + ts->duration;
 		return 0;
 	}
@@ -91,20 +217,30 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	if (time > ts->timeout_end)
 		return -ETIMEDOUT;
 
+	/*
+	 * A millisecond interval passed from last time? Trigger deadlock
+	 * checks.
+	 */
+	if (prev + NSEC_PER_MSEC < time) {
+		ts->cur = time;
+		return check_deadlock(lock, mask, ts);
+	}
+
 	return 0;
 }
 
-#define RES_CHECK_TIMEOUT(ts, ret)                    \
-	({                                            \
-		if (!((ts).spin++ & 0xffff))          \
-			(ret) = check_timeout(&(ts)); \
-		(ret);                                \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)                              \
+	({                                                            \
+		if (!((ts).spin++ & 0xffff))                          \
+			(ret) = check_timeout((lock), (mask), &(ts)); \
+		(ret);                                                \
 	})
 
 /*
  * Initialize the 'duration' member with the chosen timeout.
+ * Set spin member to 0 to trigger AA/ABBA checks immediately.
  */
-#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 0; (ts).duration = _timeout; })
 
 /*
  * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
@@ -192,6 +328,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		goto queue;
 	}
 
+	/*
+	 * Grab an entry in the held locks array, to enable deadlock detection.
+	 */
+	grab_held_lock_entry(lock);
+
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
@@ -205,7 +346,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts);
-		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -220,7 +361,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		 */
 		clear_pending(lock);
 		lockevent_inc(rqspinlock_lock_timeout);
-		return ret;
+		goto err_release_entry;
 	}
 
 	/*
@@ -238,6 +379,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
+	/*
+	 * Grab deadlock detection entry for the queue path.
+	 */
+	grab_held_lock_entry(lock);
+
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -257,9 +403,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		lockevent_inc(lock_no_node);
 		RES_RESET_TIMEOUT(ts);
 		while (!queued_spin_trylock(lock)) {
-			if (RES_CHECK_TIMEOUT(ts, ret)) {
+			if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
 				lockevent_inc(rqspinlock_lock_timeout);
-				break;
+				goto err_release_node;
 			}
 			cpu_relax();
 		}
@@ -350,7 +496,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	RES_RESET_TIMEOUT(ts);
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-				       RES_CHECK_TIMEOUT(ts, ret));
+				       RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {
@@ -375,7 +521,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
 		}
 		lockevent_inc(rqspinlock_lock_timeout);
-		goto release;
+		goto err_release_node;
 	}
 
 	/*
@@ -422,5 +568,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
 	return ret;
+err_release_node:
+	trace_contention_end(lock, ret);
+	__this_cpu_dec(qnodes[0].mcs.count);
+err_release_entry:
+	release_held_lock_entry();
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Tue Jan  7 13:59:54 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928949
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72D401F2376;
	Tue,  7 Jan 2025 14:00:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258437; cv=none;
 b=ciWUYikgAhoAcCOhSirZgbjW8vl7rShoC0jC4JaBXf8xJdu5sd0Q6iPoqAkmNeL2LcEzBB7Or3oA+IXjfXeCwEt//+FRiK3Xtus4eTPl9vYrquIldHrHlAEgDRf+rkQoJpHeDxr+hyy0XNHXiSv9rmbGCHRbvNu6t+YxvFsDF+4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258437; c=relaxed/simple;
	bh=NQnonbkW/L0ilYWBNW3FfTS4fYIHyEnpuydDkhm7gMc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=vDU6ZTmWq+SSFqKjnRi9HySH0P6U3l9s65cBp6q9TnxXo1t/AbiAp3p8CRYb0o3YXVMNl9GGyLNBGNzWvdVs5mQLplFr8MK8S1hvBzX52ryxqEXRE5f/mQWvjruqqA7okzfun6tCsTe/r7/OZq8VctM0xYVBNavv+FdmW6Zmzsk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=EO+iZc8R; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="EO+iZc8R"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4363ae65100so160609035e9.0;
        Tue, 07 Jan 2025 06:00:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258429; x=1736863229;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=43tlysq7zNNGdym4AGQ5BQPyh+ovqgtK3c5cIvqu77U=;
        b=EO+iZc8RV29si4rPbJe+GcuXdVSPEtyWqVf4UKAATIg5KxSqHBJKEIFn02k0OSzCvc
         bviZM6GfPFAR+e1rAaqkc0sGY6z261G+HTvcYeqSP/o7DgWMG7OOiY+hqSacYe3wHBDu
         7nokbVqzq9GoWlcxggG00JjDYp28nHiayI+djRLSwC1w8y82BwreOPcQo58RrOCnipYo
         YNmNqmTzT3313yI8DQ4Mf5Bza76oX9oFVZHS1vHtmlrP+atcMSkAfwu3GeI/QhMiykf3
         011TVmM6nirV1xhfGvZLjWwa+zSuHwcRB0Rmzj9829+c5wzs4E0jQiSPj4WLIwimqRYN
         WiOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258429; x=1736863229;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=43tlysq7zNNGdym4AGQ5BQPyh+ovqgtK3c5cIvqu77U=;
        b=DbLcUML1kDRHjo6VmFlhsYxMjQi14dsWmyV0MKMkxUgPZkQ/pmxkzrtUJ35rLk90Dh
         6FfTRbu4aUMvFDZj0gErR9MMG/XwBZm6amDO5oJXUDGDdP/wFZ8NqIpyRW6ej2fjyGgq
         dCImFE2BnH8hMSt7NzU31f53WCHEtt5/9wAMMbi3f3NeCtpbGWhZelKbuYLdtgGzc8N4
         34+mDuFWUunbgjXwVTH3eKfeILFPSSywN21nBFiMWUfg7gwidw1cFf/mhCvLXlF0xQ6b
         h4mvhuZN75WlftWfo0tcMXOQeZ48cRA+C0Ze7nWPQ3eKEsIkVPAUB6A6QZoC7v3Tvcvb
         yoAg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXmT7SaicYSUAeLGGmXAyw69xRfmdL+hdfDyXa0FJg6kF1+Dh7opPstWzeKNyb/kshAjrF00X4SMKNKVQg=@vger.kernel.org
X-Gm-Message-State: AOJu0YzyiQEUA2y69CpGJT2qON/4m9MAGKXYJEgp5z1kuvSxhC/KFk2O
	shLgZpmYhLr9XVYXQgynLXWj59cQUyTB59sraO4Vx6jGu8lKsP6KijJOSU1e3dS8Ow==
X-Gm-Gg: ASbGncuP50Jn3r7IW4CvemgYWBOhNZHfErAGSKa+i2poJkUDuvbZCE1wIQJV93+awfD
	2kMPeSFZgb29AAilaS95biExWauukByfG5aJi+15PuVuK4gaX6ZAaUW6kt38U3L3Tpj6haGDdNn
	/A79O4oAG19m0EBc6jdawc70fZQT53Or7V4ejAkOD3sgnYyGTzLfq2649aw5xWdq27MSJBx2Idk
	pmyj2cocrG0yJAr93IC7KBX3Ei+TJR2ucBkrwa9zA22ge0=
X-Google-Smtp-Source: 
 AGHT+IEgs/CTyhXlpbJq4uE7mFMuNocBUtYTYAGlhXSBEO0EsyT4ki+VksmyEB1NVrQLSmsw8uKiog==
X-Received: by 2002:a05:600c:3b18:b0:434:a26c:8291 with SMTP id
 5b1f17b1804b1-43668a3a329mr492775195e9.24.1736258428594;
        Tue, 07 Jan 2025 06:00:28 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:15::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43656b4471bsm632262645e9.44.2025.01.07.06.00.27
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:28 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for
 CONFIG_PARAVIRT
Date: Tue,  7 Jan 2025 05:59:54 -0800
Message-ID: <20250107140004.2732830-13-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3736; h=from:subject;
 bh=NQnonbkW/L0ilYWBNW3FfTS4fYIHyEnpuydDkhm7gMc=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCdeJqRobD31GnRq1Z9++ZTfPF+Ex0kHzRftxhT
 qxlsP8SJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8RysMeD/
 4zwGKQ3IE/6zj+ChEvtzuu1W18Ht1uMsejMp6YIITG3U/2m5Aq0JWsiCpC1SsRv1uu4PdtdnAYeoDo
 iCgVCbn/mdw7AwmuLxa/TPWxuwQsP1fh9YwjJWFzTqb9dYcxS/C+WN7mKQUjQ3hnhm6sXJrY3+z06L
 r/sZPuH8uxpAmCgTaB5AdEEL5Mmq6I0vA9Dyj9IBvcnTDuYbZY9yl30TmD64is8Jj26dfor96bEsfj
 vswk8bkpL38CWWj4KJL14W7j3G331JL9trBQlJCEUTHmTp7Chofg4wBvhIG2qz7wy9BTa+1IBUV20E
 whWnvBcZ5N1hYbb7GCcd0IR+0ht2h3Oh0RQO5aoVaSYYQE3PQN0GWSe+WRqsA0UNcFCYhMJ6Mz2G+3
 UguSrgciUmQqIUhqXnmf+ZRF0qOYElZN4Vk0OohrSeE7QgJrYAVf6VMAbOALNhbMCx7EdH1RJ81KnT
 lFPkIv+/z3LgK9kqu/NnmF8VgRKFKjl2rgq8Pxugf9L7bOTfYReEMxAekc6zwRYwSuGOmxtWLl23KS
 lX9zlyc/qmripuJZ5o/7WGKQ4K0a8vNtQ/F0Kk++wLN++tUjqx+8J+oSdwW1QGImdFkqYsjaQgRKkY
 ly6M/7lD2ej1RkeOnx91GpzjlpJ6xMRru6iIeQhO+o2qBKjFUlxZrdiO7k8Q==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

We ripped out PV and virtualization related bits from rqspinlock in an
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection.

We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/include/asm/rqspinlock.h | 20 ++++++++++++++++
 include/asm-generic/rqspinlock.h  |  7 ++++++
 kernel/locking/rqspinlock.c       | 38 +++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/rqspinlock.h

diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..ecfb7dfe6370
--- /dev/null
+++ b/arch/x86/include/asm/rqspinlock.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RQSPINLOCK_H
+#define _ASM_X86_RQSPINLOCK_H
+
+#include <asm/paravirt.h>
+
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+
+#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+       return static_branch_likely(&virt_spin_lock_key);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_X86_RQSPINLOCK_H */
diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index c7e33ccc57a6..dc436ab01471 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -17,6 +17,13 @@ struct qspinlock;
 
 extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
+#ifndef resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+	return false;
+}
+#endif
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b7c86127d288..e397f91ebcf6 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -247,6 +247,41 @@ static noinline int check_timeout(struct qspinlock *lock, u32 mask,
  */
 #define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
 
+#ifdef CONFIG_PARAVIRT
+
+static inline int resilient_virt_spin_lock(struct qspinlock *lock, struct rqspinlock_timeout *ts)
+{
+	int val, ret = 0;
+
+	RES_RESET_TIMEOUT(*ts);
+	grab_held_lock_entry(lock);
+retry:
+	val = atomic_read(&lock->val);
+
+	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
+		if (RES_CHECK_TIMEOUT(*ts, ret, ~0u)) {
+			lockevent_inc(rqspinlock_lock_timeout);
+			goto timeout;
+		}
+		cpu_relax();
+		goto retry;
+	}
+
+	return 0;
+timeout:
+	release_held_lock_entry();
+	return ret;
+}
+
+#else
+
+static __always_inline int resilient_virt_spin_lock(struct qspinlock *lock, struct rqspinlock_timeout *ts)
+{
+	return 0;
+}
+
+#endif /* CONFIG_PARAVIRT */
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -287,6 +322,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 
 	RES_INIT_TIMEOUT(ts, timeout);
 
+	if (resilient_virt_spin_lock_enabled())
+		return resilient_virt_spin_lock(lock, &ts);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.

From patchwork Tue Jan  7 13:59:55 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928950
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C08811F192E;
	Tue,  7 Jan 2025 14:00:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258439; cv=none;
 b=ImwuDoZm5uUFfhhDIl0IwGmUU/5Va6j6mFPPL/A5q72s2VibLpmOYN6tVSqd2+lQ4AM8ZfkRKv4p3LWyu6//TgxelB5s1UL+k+O1sbsq/ZxfvAo4pjRWsQxx82wLKr94qT11WK6LVL8Y33BXsVrZf43VBMCU2wOmavnppgNNNR8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258439; c=relaxed/simple;
	bh=+fj1yjje4sqeNTogpj3CK8ewZfnMI1iC7ebB8+PSGhM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=hcmZPjdFcuIKyHCz6Igt9LennYj9eWkNTqEB2Yh1aaukiBxdvZT9lSUDvrSFwExa+2xBZLqyK2UO2+4jh/BPSOBKMeez3lxaPcpYeP9WDzKqBhCF9Qs8QiW+yalbRlApKGlZElszoKzwj+GyPLEW8reKw1AvLGChydrLGjYXANQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=LhTnDbhW; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="LhTnDbhW"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-388cae9eb9fso8269081f8f.3;
        Tue, 07 Jan 2025 06:00:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258432; x=1736863232;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=PHfuQaN0Xplb/Xfp3iI4rOkeKsbB/lrTnm5fmzijsTk=;
        b=LhTnDbhWi1+pREVuZ3He9OtlX7w+kMQbtmaJ5PSP/QcmvdqZya0n/axJh1OIu7zKOt
         8buQmWkSN+EsyPJHSF8UUCY/T9w6Y5CrJY2L1uAD6MAiH2uDd2dyluDZSHkhREh+JNHF
         EjFiMDG00nYbxe6H5d1Z51xq8cELZlgwMlDAJOAGH/I/6oZ/YnR1bnLfSNosFXOdwQ2m
         4UrMdHlDiqZEQVMuDI55K0T4BlXplnQQ5VTF0KUSqVWM71rNNgtvKlpjRGol5FM8WmXI
         q0QXKxKpxgmoF6JFHNP/f4dDVS4ZW1INCcLnuOgBB82eOLdAU03vOtus+HO59KQpLTAE
         Uq9Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258432; x=1736863232;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=PHfuQaN0Xplb/Xfp3iI4rOkeKsbB/lrTnm5fmzijsTk=;
        b=R0x1LcWVdk/65cqaYkFy9T9vZJ9E9408D9vJzPhXJBy5SVdrsY0x2uhlhlK7I0B/TV
         0fgitMMoMbhrUP+IOC2a1cNSz8hnav3jou7hKUvtTuE7XTh65JZf4MjOPZuTI0toRCHx
         JTZSH12geGF1c/dtcW5tQ68fx7/qw5X22Kp75oDKyCzPMAwOpVXEQ0RpvJblEgeX9VSr
         xoUk6e/Db3OkVp7roy9LI6w/jZvF9A/5IcFXRqsLzC9U+1TqaLCggzTZ1xkMKoDNj0Da
         413inuYMur9MfD/xoVchAFTgdIXgFMbjIQy8dHTMkW6ulp2bq1ZG1b7OVkBLP390ywbk
         7olQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXpqm2qi0hG/5fd5JQt/o6dcSv+Z62iJ7uQJK7/ZHMFL1mgslhypSicUWgnjRD61U4rpMLHb4Ht6gEJO+s=@vger.kernel.org
X-Gm-Message-State: AOJu0YyXH1YEzUMxYJHmE2XoQOMo29je88yvlrax2P8aZoLmkER5+ZtS
	meYsjabBbkQ2qBsYRcDNS+F155UZgqnZk3V331Vo0/p0Ey7Ag2TCOKVfQlZH69gPkA==
X-Gm-Gg: ASbGncv6lqwH2PoTEuNKClQ93/w/1NLOfp512an04yPsTqA3l3obcwjzM6v3uJuZV6g
	L5KB5lLd0fpKus41bjofRrKZIL/N812IMdwVd3ZweT+EgNVUKCKPxMD8d1HQyRKKzTPBPZFSY0w
	kmvYzNtrUICM3cXGk1FkTHxBD5QZMmfWNlQRiDTdOmMwjKNY/OAiQl58hb7EgamARYwi1vQMB6M
	kxhhrFuM9C/v2X6hwqoA1MO84jtQgFAsCcsLnUzBgt6N1E=
X-Google-Smtp-Source: 
 AGHT+IFfdWUpv1TICRNd5gNjkfPAEVNtSWbX5T6TgxP/mRwQrqvZCVkhXUZ/INA9P9DEsp7+2W9Z/w==
X-Received: by 2002:adf:a350:0:b0:38a:82a3:395f with SMTP id
 ffacd0b85a97d-38a82a33b6bmr1521736f8f.9.1736258430286;
        Tue, 07 Jan 2025 06:00:30 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1b::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436840b35b5sm451688345e9.39.2025.01.07.06.00.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:29 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 13/22] rqspinlock: Add helper to print a splat on
 timeout or deadlock
Date: Tue,  7 Jan 2025 05:59:55 -0800
Message-ID: <20250107140004.2732830-14-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2133; h=from:subject;
 bh=+fj1yjje4sqeNTogpj3CK8ewZfnMI1iC7ebB8+PSGhM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCd4d8tqFkNZ0bN8cnwm6Ynf6DhaNC/1azMoVg+
 1yeQzjGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnQAKCRBM4MiGSL8Ryir8EA
 Cl0iezObINQRZH9QV5ns+3va7BzB7h1ZBd/UqkSe39Ia9jpf1FhtNawTtLydU7TRKkFgAOASvNJoX1
 dqXJVBaga8FSazHXoxzm2tPn1hX9osVBFWUa1YYn8FUn8fW4d3OI5Sz4m5GsxGAw7z1CqTXHnSXTV+
 KfpuQiOeFEKWmwfdnb41bNJX56CuzjMVO3FuCE0f/gMcep3r+ejV3Wg5Ktyo626sRwNwbSyY4ivya0
 R22uw741U3ZDLtluXUEUraOajkTYT1c3eghZKhH+Vvj2jPx3ILrAXVMoBv2GE9SiX2o1CfZYsnZAqj
 +ZJ2WnovyGq17jNP/+QmfJs27MjDN3zG5C4FbgmvSK1cJ1M38DdGxht5RhDim/jvs4rt/M4MfdDjDf
 LOqbwApc5A6GzOt/h61EjMXBnLPLxLBs08m9ZoF3GL8boY7c/rtafSAOAQm/apB600f8OQhI2sbQWX
 4+lBD492cC94lUev6uXu04UOIs4P1t3XeW6I1omWzi2mKg5H5Tb/XO86ugzy0XDvHgSxnUukXe/T2e
 urLCwP71eS1UeOXJ9BWL5lLRa3gJL0QmdgzecbCG4dqdIj8VsrdMs6uWlLaFw+16gKlM41Kqi6tGBu
 JEvvfpSPdzh8B/E2N3bvcn4/5TZt2Fa0KrRd95QiBC4sI5LLrxsPpgqUdchg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Whenever a timeout and a deadlock occurs, we would want to print a
message to the dmesg console, including the CPU where the event
occurred, the list of locks in the held locks table, and the stack trace
of the caller, which allows determining where exactly in the slow path
the waiter timed out or detected a deadlock.

Splats are limited to atmost one per-CPU during machine uptime, and a
lock is acquired to ensure that no interleaving occurs when a concurrent
set of CPUs conflict and enter a deadlock situation and start printing
data.

Later patches will use this to inspect return value of rqspinlock API
and then report a violation if necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index e397f91ebcf6..467336f6828e 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -187,6 +187,35 @@ static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
 	return 0;
 }
 
+static DEFINE_PER_CPU(int, report_nest_cnt);
+static DEFINE_PER_CPU(bool, report_flag);
+static arch_spinlock_t report_lock;
+
+static void rqspinlock_report_violation(const char *s, void *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (this_cpu_inc_return(report_nest_cnt) != 1) {
+		this_cpu_dec(report_nest_cnt);
+		return;
+	}
+	if (this_cpu_read(report_flag))
+		goto end;
+	this_cpu_write(report_flag, true);
+	arch_spin_lock(&report_lock);
+
+	pr_err("CPU %d: %s", smp_processor_id(), s);
+	pr_info("Held locks: %d\n", rqh->cnt + 1);
+	pr_info("Held lock[%2d] = 0x%px\n", 0, lock);
+	for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+		pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]);
+	dump_stack();
+
+	arch_spin_unlock(&report_lock);
+end:
+	this_cpu_dec(report_nest_cnt);
+}
+
 static noinline int check_deadlock(struct qspinlock *lock, u32 mask,
 				   struct rqspinlock_timeout *ts)
 {

From patchwork Tue Jan  7 13:59:56 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928951
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A2AF1F0E2A;
	Tue,  7 Jan 2025 14:00:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258441; cv=none;
 b=Hp3121cRvt1DO+OEi1R+rfBCKmkXmAF6WpMUT1NW6qlrOwcwVaNJTGrwyV1jmyb+Mjdg6oEe9fcIHb2dQtImWm4+mvAKc9dtOVw6p+Ko5DsMdPBeeUFotPWSs8Lmss5I+Wkn/C70V3fiub9VUfZcs1gHYpzXSREFbJWgnZL9V0A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258441; c=relaxed/simple;
	bh=+LjPicSWkajqb0Xpu6IbfX5g4qiRLkXWCPBEaqfWN64=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=gmHuwot803NzX26sYpXW4QO9SLhGI8YZKhccPjrw78gve15aMO/wbTSYshIsXNWqjb6+yXU8E7/GeOciIy945AXllZDDAepddMXza4aixRbNedsyDT6dslJd3FJE04WUfpe7aa9RKnY7suHlIx0VOG6/OTQQknGUONJembDW/iQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=mST7U2fu; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="mST7U2fu"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-436326dcb1cso104023715e9.0;
        Tue, 07 Jan 2025 06:00:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258433; x=1736863233;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=XyCQ/b9Hwl1sjqi5LX92l17x4cgrzCtqffije6Lab+Y=;
        b=mST7U2fuB95PWvetwS5+KTmsEG1qg3Pp7SNUjEoP57rl1UrzdemYKXfkVHS051xiFa
         5/HL2SgyERsvjLD71ADXr5B6/9cXsFUgP9BFXDpoePVTJXj9yGZVfNlYAZYl3dOCkJno
         +1XjBpL+ipatTY0uyKT2hc2KLKPhtQGF6/hnqZU47RC4djA2gYq7ZeXTvDS/wM2BhW7I
         rquteIqXPVwNC2Wp/2omEflQSioMbvEASedMJoCZTgeFcNKGzOUMq8/gpRBoiopAKICi
         GB7DTzffOE1tMrcpxpqljpASFq9DfAPnGXK+N1P3l8k3oyHBZ2IzsJua2U5wYNIwH3Qp
         XHrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258433; x=1736863233;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=XyCQ/b9Hwl1sjqi5LX92l17x4cgrzCtqffije6Lab+Y=;
        b=JeAtJ0an4nZ2Xf9prornlcx48J/ccGUX2u2NqcubvrS+f9Tijjmj74vantuaSGKVE0
         L4SiZnrGuc59hbE9yDtNI6ly84YTbbgruX4ChYrF2NiQnBJl+hWuK/tEivL4jZ8qhzi0
         zZrh3IVofvU+/fIeAYbfLchNj7DgMJd5kUlja+nAsf+H4DlQII8gSf3uh40eLoLWcijl
         HMpboRb5nY9qdT1JMJ8it6oNGQ8RdE8ckGL6AqawguaWc5YRrJTwX5X4nK2MIafGQb/L
         MFln0y7hyvllbknhMCByMIXlyXGZIQIXgZs6YprNWnYJqz66kmKclfPCGT9ZzqD9Kqq2
         tCgg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVgEH7Gj+grCZ079lOZcfANiKjZ4Os7LW5JE7qvqWUG+Xr/zCmAsQHUFmUcmd9v79o84rqMF8wyar1yzXA=@vger.kernel.org
X-Gm-Message-State: AOJu0YzomJ6Ujj7oNt6vMYO01F31jalraNCThFGn1+WxXCzFqmD2TL+2
	eb3Q2QxB1SKQ6+e+E0px/X97tBOXKtAUA3xYV28Y6NzfHge8GriZfAxdVVAUf8UyrQ==
X-Gm-Gg: ASbGncvpnlw68TER7SN+Cqfxdxew+SsT6T++92hP9hRjRZ7hyfaYLiSm5sygr5+0W4t
	CxqLe6YGkl5M1pVjzQr+QQcCe+lNy+K3zBT/DTVvH1AmZn8FRdDSd6YyXFtw+VUiriZOX9WioWK
	6DFmuxxdzJfuewJkcyXjCw9g+pet3bBScHJBEQL5FyUB3RHlZmPSAZTaooaakwH61tu3cEMerJk
	ZT4ZokwxIOrE3/P71ImPI5L/JQjeNCC5EwMh/hJovm9Tzk=
X-Google-Smtp-Source: 
 AGHT+IEoOtzaKn9Hqerbc7iyfSKnixD4sPkQhM5aoAhUWrNCcXx9tUDa+tCHP3fa5ikF6sNyHEkU4w==
X-Received: by 2002:a05:6000:18ab:b0:385:e013:73f6 with SMTP id
 ffacd0b85a97d-38a22408c41mr43988884f8f.50.1736258431536;
        Tue, 07 Jan 2025 06:00:31 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:18::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c8ac97fsm49838266f8f.92.2025.01.07.06.00.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:30 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
Date: Tue,  7 Jan 2025 05:59:56 -0800
Message-ID: <20250107140004.2732830-15-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2956; h=from:subject;
 bh=+LjPicSWkajqb0Xpu6IbfX5g4qiRLkXWCPBEaqfWN64=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCePj2nHYP6pfgRdaD9WEGv4aJ0AuKHRXa8C0Wd
 tgrlumyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8RylSREA
 CBV3QvbmPWrun/4G/yhVb42L5Zg9tVBotobyB8aw5Xqy4t8hYhJfqG93yUROGUMhP/R803YxDPUOW5
 4dXL6ET9GQ7vzV8eXd40JmN8ds2kGQhChELZC1Cic6ezd2O63V3MgXlRoQqSm51pb4OMi9h2ri+v4k
 oKXcURtM3Ds1h/wXb6gx0v4awwm6EggYSTpWHHGsLVfxesEOSNToVFEmG02netYfzgmbkGNUeZ8Zxj
 wxJYgU4e8WKK0zPK7rwwo636SaIBK5QskO/uv0fLx3fc8x2NvIibDlFjXtW2QGuQVUZx1uYwQm9qYe
 I364nLFCqqiNtZJwqLlEYxUz+PO9v3bONHmYmRBPUKJYUlL+t8fcDtW2wauYKsDK61+4UZgpL741pS
 rMlaKUwiBOmOOGBZI+5WVJUzemrrB2+GXVhEQ+6v00edfpJ0Mu7X7VskrVNc8Er5pthrvBuzQ1P8zZ
 06PP5LA0v9dA4yOAoKEbit965/bwjsu5s2WIkj6bUhqqzsgtgqMeWRVHzu0sDkpzRjYmrNTMwh9WzW
 6EicSg5xg3XEWTFkAF14HAa5/9KMyZ/ew8D7oxhOGR2FAPUkrewOP/Us0cTgWQKdfr/IYoddjFekOK
 SkJKo+eeJtZlpzJg92OXBnB5JNeZBNj6D4aLpQ0kxNLMBrn+AuVbBjaztthg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce helper macros that wrap around the rqspinlock slow path and
provide an interface analogous to the raw_spin_lock API. Note that
in case of error conditions, preemption and IRQ disabling is
automatically unrolled before returning the error back to the caller.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 58 ++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index dc436ab01471..53be8426373c 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -12,8 +12,10 @@
 #include <linux/types.h>
 #include <vdso/time64.h>
 #include <linux/percpu.h>
+#include <asm/qspinlock.h>
 
 struct qspinlock;
+typedef struct qspinlock rqspinlock_t;
 
 extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
@@ -82,4 +84,60 @@ static __always_inline void release_held_lock_entry(void)
 	this_cpu_dec(rqspinlock_held_locks.cnt);
 }
 
+/**
+ * res_spin_lock - acquire a queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline int res_spin_lock(rqspinlock_t *lock)
+{
+	int val = 0;
+
+	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
+		grab_held_lock_entry(lock);
+		return 0;
+	}
+	return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
+}
+
+static __always_inline void res_spin_unlock(rqspinlock_t *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto unlock;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+	/*
+	 * Release barrier, ensuring ordering. See release_held_lock_entry.
+	 */
+unlock:
+	queued_spin_unlock(lock);
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (struct qspinlock)__ARCH_SPIN_LOCK_UNLOCKED; })
+
+#define raw_res_spin_lock(lock)                    \
+	({                                         \
+		int __ret;                         \
+		preempt_disable();                 \
+		__ret = res_spin_lock(lock);	   \
+		if (__ret)                         \
+			preempt_enable();          \
+		__ret;                             \
+	})
+
+#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
+
+#define raw_res_spin_lock_irqsave(lock, flags)    \
+	({                                        \
+		int __ret;                        \
+		local_irq_save(flags);            \
+		__ret = raw_res_spin_lock(lock);  \
+		if (__ret)                        \
+			local_irq_restore(flags); \
+		__ret;                            \
+	})
+
+#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */

From patchwork Tue Jan  7 13:59:57 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928952
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E0D71F131C;
	Tue,  7 Jan 2025 14:00:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258442; cv=none;
 b=pu7vzas+xouX3OL+nqDGgr/BRsJte/8JQLlq4YvifpI/KwdnK2pKbdlcravbHw9h3G2fEslMRohBIkF1UINBHmP6/ZVNjqiZq1VtL6YLdAIIwQP6CvzcSgm91NGIGy3MXJdKrLoKuoh5pr06LuMuvRifw7S/6xYAR2cC4uANLdo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258442; c=relaxed/simple;
	bh=kJwiyjXFeK0kbAuecq07qq2Ed4NzRpp2/JL55Sm7hfM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=bWhrfxUOvDHDn/dq2yUX9j7GbNZVcL3kKvyiaVnw5Y42qKV1M8Adk0u5akCFPCPkuGsAZxMTd+FJDABaPfPpTctOnlzIKw6fCZ+/oLqWiFofCgUpeT6cN0OKHrgNBvWAMkZCZYUob98FcdyDuPEiC9Z1+QUGpO7PRqdGEEeCDvk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ETCozqCi; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ETCozqCi"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-385e1fcb0e1so8526388f8f.2;
        Tue, 07 Jan 2025 06:00:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258433; x=1736863233;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Y9xklAp5T7bV+X+CaErK/4BbIoKeDnGWA2MeVPN/LNA=;
        b=ETCozqCihL39YSqk6NlN/S2L+OcfpWngISXUD/ox47hfZGjKcgvLlltmlWcdfDgAGJ
         BDEMb0AyK8PFs3i9IWfTi0UeKdwEyBjWs11rtv/yxyy4T/0OhMQXtLNcvkJwbc5mH859
         M3sRjiTevabl5dnEdKIM4iB6XetHY1M0S8ld2k6bv66hFA5RWUWxNRE/dtZmAjr6M9ES
         116eoA+TX21eEjKaHYR7vqxXDl91IqOA6FMrlLBzqRxO18rKqJ/SFIBGn0ufDFfl0tXQ
         KlRo+TeB8pjE39jGVGKg3VtVfrWc7NO7BFIwqfgN56/SIpjMeM8irXlHgWzHA73zSQ4m
         C8Cg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258433; x=1736863233;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Y9xklAp5T7bV+X+CaErK/4BbIoKeDnGWA2MeVPN/LNA=;
        b=FR2bsY3sgojhMC5k4cXsZHUPAqQQl/uJzAdlwVhqkyPl4MsTKegK7Bu6AWfU4H2osf
         zGuLo15rByR0AGT7LBLsj8SVrVqfVe2BYzlnU884mDnQO+IBNR76y6Hb615FlOWUAuaY
         1MC4/uul1Z+Dzwov5FupXZPXat9cCMKNZghzZV5W2ff6SWlDHUF+wyN/5zHLBZuCH8hX
         mznPEqM3FhzwosWASMVjM/h0MDlhz9kt8n9PApfmRr3TG58d3DFRs5Bnq5u1qyhgKdom
         r9BzZDrrDXCxeIe9sc2dK+2qkOZkEXTH03JIs+osQhNMomTumDToTfYklTHpm9X1Gjrj
         x3Vg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVs384yY4NtGbYi/lOez9dP1/0maGJYrHCXVJEsextwIGokO28dNT+9/R1L0gL/fWO4Lg7ni598a1tH13M=@vger.kernel.org
X-Gm-Message-State: AOJu0YxnxHw3g4QNRHvZbeBV7qljltbUP29Le/ykTrO53rs1aCYaD26H
	BY1Pg7wGJ/uebTcctj0GnXL+MDtpN2fn1/imu7BfZbFPJTyr7HnDJiiNaBMyDbYvww==
X-Gm-Gg: ASbGnctNycAd+okPb/YNLfwx0GTylAdjL0lQuVEPpquTKGEExySVQsb5AocX9fpF2Wk
	9pxvsMCufznoL91KVbDI0PctXqSDw9wipfcIfGY6CzoboH3tJVuMeGb52CXNLqvIKqNsVTFkFnW
	qhYYw7yWK8u3J3V7YcBODmw0M2FBAIcheC/M6+jcI+2bdttUw3KuJuq8aDDwRpyUaCUnwBx5An0
	qzc7Euc0/AzEQhgaWQZmmDwabRNPKoMVua8cgWkCP80cQ==
X-Google-Smtp-Source: 
 AGHT+IEsxrWwpFJrtsfOm6AgcJMnVDxtLA4K2laEeRp2CCwsjr5JXjmVwIf43rIlpNfRhbgfPgxVFA==
X-Received: by 2002:a5d:6c6d:0:b0:385:fc70:832 with SMTP id
 ffacd0b85a97d-38a221f9e10mr49545102f8f.16.1736258432624;
        Tue, 07 Jan 2025 06:00:32 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:c::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a28f17315sm45033426f8f.108.2025.01.07.06.00.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:32 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 15/22] rqspinlock: Add locktorture support
Date: Tue,  7 Jan 2025 05:59:57 -0800
Message-ID: <20250107140004.2732830-16-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2982; h=from:subject;
 bh=kJwiyjXFeK0kbAuecq07qq2Ed4NzRpp2/JL55Sm7hfM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCe5KmCSuSd9mM2LdcZb8GYSohl/x1Jvibb1Hdu
 iYe1KECJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8RypEnD/
 96FIqsjGRFcCnCJy0k6k6rLkuZ0umtG5fVLQL12gf64SzNuP3q/++JUiEAyZSocdSHjKHEZZdf+j7G
 5Gt2mVOxAxvavz6BDHBa0ldHyHDPcmDYJd1gN5Ya6C2IqMAz7F+KPrmpZ4hWCKdSOw2IxURW7Wx6/D
 btp7VzcjcuqGZOnepibuXwYAAptPF5PESU77UEiWj5aX7K74xm5qxnFBrche+qSJW6+g0SMYaYBteN
 NNxG4u2mHeS1MoXQBvL4qjh2SZ4jguoFzh46nYmicQbArVCY+Nh0h0KZpJww7KasFTUlIwfmLbcjzz
 dYJlflyRvWVMdmLixJgBJNkT2Zhj/+iIoSXFOb+6Y2zAnv0ZZza2yPZHTt6SZFKIANpPBKBwVqX6N3
 CklqXIzXkdFyzu2/NOUJUwStnq/8DWceHLeLbyZZ93mf1dXNdkJcOHo/VU+VK0yV6iO9/dwmE/aqm+
 cRu9/1FyzliKuCCfEelMeDwCOFUCguWZH7mi+TD8iq8v4Sq//yts/30JTeUMe7cPmZxBi5juA8QWGs
 iWkOhXW9bNU37QpTDGeklCrGsQD0KO7JXY9W80Lo5DhEj2qQGw2CmtqmXdIOSFwA5pqhtnhUANN6zf
 M/DCcqpZfe+ecwLkaLmFihYVOtrYGQTMmwgFn5YEPQmlhf/Bf58hI6Hq6tCA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce locktorture support for rqspinlock using the newly added
macros as the first in-kernel user and consumer.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/locktorture.c | 51 ++++++++++++++++++++++++++++++++++++
 kernel/locking/rqspinlock.c  |  1 +
 2 files changed, 52 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index de95ec07e477..897a7de0cd83 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -362,6 +362,56 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = {
 	.name		= "raw_spin_lock_irq"
 };
 
+#include <asm/rqspinlock.h>
+static rqspinlock_t rqspinlock;
+
+static int torture_raw_res_spin_write_lock(int tid __maybe_unused)
+{
+	raw_res_spin_lock(&rqspinlock);
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock(int tid __maybe_unused)
+{
+	raw_res_spin_unlock(&rqspinlock);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_ops = {
+	.writelock	= torture_raw_res_spin_write_lock,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock"
+};
+
+static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused)
+{
+	unsigned long flags;
+
+	raw_res_spin_lock_irqsave(&rqspinlock, flags);
+	cxt.cur_ops->flags = flags;
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused)
+{
+	raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_irq_ops = {
+	.writelock	= torture_raw_res_spin_write_lock_irq,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock_irq,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock_irq"
+};
+
 static DEFINE_RWLOCK(torture_rwlock);
 
 static int torture_rwlock_write_lock(int tid __maybe_unused)
@@ -1168,6 +1218,7 @@ static int __init lock_torture_init(void)
 		&lock_busted_ops,
 		&spin_lock_ops, &spin_lock_irq_ops,
 		&raw_spin_lock_ops, &raw_spin_lock_irq_ops,
+		&raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops,
 		&rw_lock_ops, &rw_lock_irq_ops,
 		&mutex_lock_ops,
 		&ww_mutex_lock_ops,
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 467336f6828e..9d3036f5e613 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -82,6 +82,7 @@ struct rqspinlock_timeout {
 #define RES_TIMEOUT_VAL	2
 
 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+EXPORT_SYMBOL_GPL(rqspinlock_held_locks);
 
 static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
 {

From patchwork Tue Jan  7 13:59:58 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928953
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAED61F3D27;
	Tue,  7 Jan 2025 14:00:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258442; cv=none;
 b=EQrghI1fGbLlBgapeDeK9HS1li4PofG5R31NhCZ3EeKsRNrbnWUAvJjh45IMLsijXIFhgORGHp1sKGbZffFeBuYesLXIJQyU+Uu6UZ8JsJ5QwE6u5PgvXCFQkIoeN3gm0q7bbpKnBnIQePmo12boHUvyUekZwaWNhihYFdhk9cU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258442; c=relaxed/simple;
	bh=TW8UI9TMxQcBoeFf9FYBebGoM/areyCYdGlRu4N2BqM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Wt4/NWKBRD1C7gry2yQTTCKfKjgfzkiLKbhhgTj1MWbeGVmAFX27D0aPAkoXc/1jXaThI1S7OsfOx6B5knqBo3BeNgFifX3K9Ty/lOSdfu+I4MuTakQQ/ik/Nsd99M4ztJyWHYxoDNkQzumIr38F14cjW1/x4W4tYXLsFuR8dXQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=V93cn50g; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="V93cn50g"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-436637e8c8dso157900515e9.1;
        Tue, 07 Jan 2025 06:00:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258434; x=1736863234;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=K3hVYvq+iuJzNA1UBjbhnwAU9BqFVDkKQftt6XH2r5A=;
        b=V93cn50gdMOkDypj6OOr9gW3TK8RGwVNmiE0T3Fbpy+yDubToNdmjoHvWd3Nok5gHF
         vVRXp9pMbeMFnJEthDupfsZ4U8jtIS1WZdiNQYl+HgTIsUWqAQ8UNpq38ec3hHb30waA
         f+a2v3VpbTKT2H43q7W1Nzp9hzFK0X7I/7oYFi7WyeY8MHmD7FWbzYG4sBoh+kl1d7OW
         0qZQYIOzL94qcqqx501KejAG/KxbVoPA3vY0VE05OxxxEofZ1NkRBX9aIuzkdvsLOxt8
         e/4AZncwPA7ygIYZWjkzA2KiwqZmnZfNB2xNDATRZ+uEfd+OCSxNacRCm7PdxFia8Jlx
         qlmg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258434; x=1736863234;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=K3hVYvq+iuJzNA1UBjbhnwAU9BqFVDkKQftt6XH2r5A=;
        b=vDw7MeZNr5mr42lWkxaj+YedFkuWOvM45q0U8YeGFh+WeBqz/Bv9EcG+6px1YJgen/
         LNdOk6d8SEeoI1ZycvPuEavUMwKPEMkXb3VuMmaMALjb+h3hupuucy+1jgGiQTmmIkmd
         faMu2MR71+NRMnY+2V+xqbGugaEEXxp7ue0HdopIPH7tAgF++bY5XAr9a1yMsuYJ4XJL
         a2fW/zSSNmfv/+kp+6E0/DtzAFg+iFjIuNLR0YEIdhKOxmsbt7LYxlyLyrQtqsX7ceUH
         Ea8zI5Lzc+F4ys+Zp5CmMqXzvFKL8yFQYWUw2Q1eLCA2s9nK0MQVwNs71iNJWcMtKj6F
         xuuQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCViqiSmY+TqlaHhjIKtVqv8T02mdx2hJXc1uthM117YNOKMTMDxQ/pogo1tuQjuceOGPb41N+L3dEN3PQg=@vger.kernel.org
X-Gm-Message-State: AOJu0YyYwO98ggwSZLvldm/oHq+NvBrDnfG3dcKb+jZlIE+X2V9VnLMJ
	f8oSF3xDrwWb39YD9AFqoM3FScSMteAQqI1pwrVlbZDb0AYYT/N9s3k3oQJb0OA3Pg==
X-Gm-Gg: ASbGncvsfVVIhsTtcn8z0C0b398b7bn9ikc5jgPik5ttY3bdw040gSq719RNmzIbfxr
	DVl1eDzBIlCKEh9Lc063/Y8a4qdbZcGU86wpXkDTIoff/P4wqBTvjt60KtpgBJPio315952xFDt
	J0jm4sC1mMjZ7m8im67srLF2dy6fELVaWu+KTnMwQlHztAV/8y2gX2ukVLEsegZAmN1lEENMOGm
	tk4MATND44YKbvSHVhrFm+qxCgb6gtrDUeEgCZ99sVLvg==
X-Google-Smtp-Source: 
 AGHT+IFqL8ZFopa21eavo0wjWjO+SunkCkSTG5gMNIc1C8dYeldJRI3CGU+zbo6ardpIhWEHfdovVA==
X-Received: by 2002:a05:6000:704:b0:385:e9ba:acda with SMTP id
 ffacd0b85a97d-38a221e2738mr47348863f8f.2.1736258433714;
        Tue, 07 Jan 2025 06:00:33 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1fa2bdfbsm49735092f8f.102.2025.01.07.06.00.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:33 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 16/22] rqspinlock: Add entry to Makefile,
 MAINTAINERS
Date: Tue,  7 Jan 2025 05:59:58 -0800
Message-ID: <20250107140004.2732830-17-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1881; h=from:subject;
 bh=TW8UI9TMxQcBoeFf9FYBebGoM/areyCYdGlRu4N2BqM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCe7SrjvqmS/EXU7ItmbhB4WfLS1DE8hA31do/W
 g6wfnLCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8RyrStD/
 4qLiHX14Iz9ND/8Fxu5UtRTZYPgn5uJvgoGwtR9Fxus0bftUPkl6rheOXoAc5pbPXKS1NuUvPUOA+j
 PxjzFhKILfk07HJRkkm4whkUJVYQr2wsblZA8P7CvYkuztP7kbwtG3hMdGvlkTFvZXiO3H6f5OgPul
 c53pY6wx1fUM8IJMfB2REHhnXq56guqY8OLKjVjCvkdr7ErmUlJ/1jRg9+Hr8ci/aqPD2kfgtG9DOt
 klXWz/JHOaZq++aYwRT8AiNz4XxjRlDUgP8zNGBIXp3t1FmXbX8w7Cgsfe6RbSVIPvCfxFdvbNbp09
 DjMOXbqkw3e/xiWeAAG0paMPlyJthCvD27JyYNWdbVOUPgs/xUV7Qp2rlYcgJR3852RJca1w3MtuAk
 0qxpq2qtR/OseJAwc76EHbbQMxGrmnBAUpT38HllA6as8GShlaaua4225j+S2J99AaERiCYS1G1Khu
 C8vmMDyeuApIvm7n2/4iYhQ6J00wpSWo6n/Cp+EIh8Qrlg1zVh4oGlDsIcqnnta+kIOyHQ6mv/Lqse
 vCCq50Njrm6fwcNZywoIL3YRafz6F0FtnJbr+z1fTTHh1dyDXXCC7CD1QCjTlEEAzd7VuDO7u9XYTM
 gJxlulmgeAokdzgKGm9LNwzTNhnrLljyabPWkfA6ylZQZ8Gn7vDMrsYOl4Uw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Ensure that rqspinlock is built when qspinlock support and BPF subsystem
is enabled. Also, add the file under the BPF MAINTAINERS entry so that
all patches changing code in the file end up Cc'ing bpf@vger and the
maintainers/reviewers.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 MAINTAINERS                | 3 +++
 include/asm-generic/Kbuild | 1 +
 kernel/locking/Makefile    | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index baf0eeb9a355..fde7ca94cc1d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4257,6 +4257,9 @@ F:	include/uapi/linux/filter.h
 F:	kernel/bpf/
 F:	kernel/trace/bpf_trace.c
 F:	lib/buildid.c
+F:	arch/*/include/asm/rqspinlock.h
+F:	include/asm-generic/rqspinlock.h
+F:	kernel/locking/rqspinlock.c
 F:	lib/test_bpf.c
 F:	net/bpf/
 F:	net/core/filter.c
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..8675b7b4ad23 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -45,6 +45,7 @@ mandatory-y += pci.h
 mandatory-y += percpu.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
+mandatory-y += rqspinlock.h
 mandatory-y += runtime-const.h
 mandatory-y += rwonce.h
 mandatory-y += sections.h
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..9b241490ab90 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -24,6 +24,9 @@ obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+ifeq ($(CONFIG_BPF_SYSCALL),y)
+obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
+endif
 obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o
 obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o

From patchwork Tue Jan  7 13:59:59 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928955
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BD331F37DA;
	Tue,  7 Jan 2025 14:00:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258447; cv=none;
 b=kcmWNheNnB7Tb03SVbMzTGN590S1lEJyJ/2NIVfscyQDovqsVb7h31et5sn1Qns/jt4OswbOrtjt1iopDiXuZWE7hVRxIWNRkD7xVypCmIGmc08Q1ihfV/YrbMqnDcUifgGc9uRCGJIVxhbrslM0UPTdAwR3MsJifhv4LA5JlBQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258447; c=relaxed/simple;
	bh=xdzrOKPZ2hWmwTFGQ/lfbwpf7gE6CrMOaZO+ckuikt0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=lO54LDwstKtz4HQud2SwniQKIsJNE0UAfADZ5gO6jMurOaOA4nDomiXtS4mhEIsj0Td2+LZc1EyrewKlzv4EfYQRP6meeSEx79GM9P2v0JvRWcgHJhh4pE2LqEk6vLPR/fIHtVbW2Q8JVNxEazRqCMvjJ6vM3z1OOBjJhi2OL/U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ipNQ8hnH; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ipNQ8hnH"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-38a34e8410bso5250203f8f.2;
        Tue, 07 Jan 2025 06:00:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258436; x=1736863236;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=rXPNOF/hUFV5nXz313sgiC85dPV+9kptxTXp73D/GXc=;
        b=ipNQ8hnH3kLT5pgeyfSbn0l/osd5hl2rrEr82q0HTQuGt8tb9Fny/EbPuEFKjVw8un
         rpbdzdUGm20VVDCb6dKLV8RRlWie9Dx5r6evwKT5X0k0tBxyJcbJ5QXJ0clmLeYxTgHG
         7P+/a5oS9sH+yVXpR0llbqkfb40ZjxwfuatCx3R5wuHenTV9xVCXCitZ9g+/p8frNYw2
         KLTHbIRdyymgVZjrqhOQyir1t2x3kNyEqAeXJpeEG4ep+vHPl+YXEy+ri6W+uSaDebLr
         lEyIZyEyMlDo1NWjdl2xbg8tiQnhzp+L+VAMdCaLSEfKwpAQCvbAMZmgzWpReCk1DPuh
         4Vqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258436; x=1736863236;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=rXPNOF/hUFV5nXz313sgiC85dPV+9kptxTXp73D/GXc=;
        b=aWfomJyuLhns5UBVgvXB+IZtiXoMquBeBmb8dmFl+bi3+z5vrQHoAvoGv6V1g6GfRP
         quZMIJOVh3XW1cWp7YG2/zpRbTBECsDz9LDRlAvd7NSURX8lnPhjPiIbOY45t1OEahho
         l8dT9OknKl/V+taOhXpaeuf2GOW2TzQyg2116uayPd+SXxTPUThceDjBJWh1eT2/1bph
         o8oAyFNkzYRitsHDCl95iTuyHJlDEBrSj9oEM0RcJk3EHZG8St/pyk0cixXSD0xCZXku
         3qQy2Q9ZIxw6ikQm8YCV9j0VGhO8lF0V4Yu4DRALdInKus3HTXQATvAwCuNyR1zftLEg
         rF6w==
X-Forwarded-Encrypted: i=1;
 AJvYcCVmDLKNrSTCPuJJpLhE1TxMP9+q6OllmUYtvROFHiOl9djQOtCcQZlYS/k0Dk0kNgXd+ClXqF91oGzeSH4=@vger.kernel.org
X-Gm-Message-State: AOJu0YxiCZaElcZA+imgasYU/5SKmoNe8h3xHAePEWEAjykfKn/mNJ5c
	lrMaa+53xXOwKD2H+Ch6Lir2HCFQK/D56g9GYtRj2QEfQyn4OECWLlPIOgcNzGm6bg==
X-Gm-Gg: ASbGncugLS73+z0B70fQGaZCPTwEtfWIJaAE8R1BXHCSGZxjrGdiiIuRPKSYZdl1AjK
	cMH52EfeHjw433muTdhFURrVMeVuehwAPxtYqJ+ZcMiYI8ctBGqVIMn9sbe8Xt1Hulhpo1jH9sb
	iClFZDXQxYEyCYGBzSfnD7MMxwVx8Sm3neg4xBuUkOQgmgmKoyRbgrrq6IRuyOkmsazlql4z9/4
	tJcZEimEChr+kSkEfLIKipHAc0hssLvJASxDtVoh5rGD/I=
X-Google-Smtp-Source: 
 AGHT+IGLfX3le1QF9bFuhgiOYIfr3Kpec/JwVBCxhKp/J0T/5VVQWnITzhPKTHdsnojBfAiX98uvmg==
X-Received: by 2002:a05:6000:704:b0:385:ee40:2d88 with SMTP id
 ffacd0b85a97d-38a221f2e42mr56413347f8f.3.1736258435410;
        Tue, 07 Jan 2025 06:00:35 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:16::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c89e1acsm50596658f8f.68.2025.01.07.06.00.34
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:34 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 17/22] bpf: Convert hashtab.c to rqspinlock
Date: Tue,  7 Jan 2025 05:59:59 -0800
Message-ID: <20250107140004.2732830-18-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=10951; h=from:subject;
 bh=xdzrOKPZ2hWmwTFGQ/lfbwpf7gE6CrMOaZO+ckuikt0=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCeMLpqDtB36p7NuDOzStRRHxvhRzx96sTrG7Gi
 uYck1AaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8RymsiD/
 9WQZ336g12w3rHKDIJE2I4sim5v8DwTfukkzyRWVts2f+/j05TRAVFWlP1or6MzFmH2SvxERFxUeUr
 O0B/FaUV1tvnu+ucGLw5OEC/8ilr1UvYo0q0HDRkMdjEki1Z/8PDUCqVXlvRKTG8Slymb/6vwGrdEa
 MBEx2fbp79VcwJ7wMXAZd4SPMOeeNGXqIiqgoGc3zS9FehyqHGo9OU5i0IJyEDljE3sgkwgDFVR+Mc
 7CESpZHH2rmvDHihnYhMMZYQbsoU+vkekLopWrkps/MkHrop7KsWIaAc8MbtEaLVLTzFDR4KPlt+Xd
 n9WnjDGTaqoLEdWSfg9aUhJGY6M8iONRA0p4NWWDJkQBWuMmNXSjvB7CBg0mKsLm92hdJRm9qfa98L
 0exts//aEVPGR+pAhhXWG0tu6vm7T/lW70lKH5zPFXts3Vv8R6u2dJk328avQdWQlELcKX7PxkyGgR
 f3K4cvS4MjKHfYxVukaOCa2Cg82azsZllfXYix22JB0amU0oW3m0UNjXmm48Yc9uytKKjt9fOVPMxT
 qHO+wcRF8IOvhb29Luddzsq3/HdxahLTNLYUCMK9pZQyhSnUkLu4tjE7UmYO4i8P4P8zeRu4kRuP7e
 GliGoEyZyao2qMk3ZzcPnhmCuhY/1FpGl0sFp9s5ivqaZdorDKqOad3yRvYA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 102 ++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 70 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3ec941a0ea41..6812b114b811 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,6 +16,7 @@
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
 #include <linux/bpf_mem_alloc.h>
+#include <asm/rqspinlock.h>
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
@@ -78,7 +79,7 @@
  */
 struct bucket {
 	struct hlist_nulls_head head;
-	raw_spinlock_t raw_lock;
+	rqspinlock_t raw_lock;
 };
 
 #define HASHTAB_MAP_LOCK_COUNT 8
@@ -104,8 +105,6 @@ struct bpf_htab {
 	u32 n_buckets;	/* number of hash buckets */
 	u32 elem_size;	/* size of each element in bytes */
 	u32 hashrnd;
-	struct lock_class_key lockdep_key;
-	int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab)
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
-		raw_spin_lock_init(&htab->buckets[i].raw_lock);
-		lockdep_set_class(&htab->buckets[i].raw_lock,
-					  &htab->lockdep_key);
+		raw_res_spin_lock_init(&htab->buckets[i].raw_lock);
 		cond_resched();
 	}
 }
 
-static inline int htab_lock_bucket(const struct bpf_htab *htab,
-				   struct bucket *b, u32 hash,
-				   unsigned long *pflags)
+static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags)
 {
 	unsigned long flags;
+	int ret;
 
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-
-	preempt_disable();
-	local_irq_save(flags);
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
-		local_irq_restore(flags);
-		preempt_enable();
-		return -EBUSY;
-	}
-
-	raw_spin_lock(&b->raw_lock);
+	ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags);
+	if (ret)
+		return ret;
 	*pflags = flags;
-
 	return 0;
 }
 
-static inline void htab_unlock_bucket(const struct bpf_htab *htab,
-				      struct bucket *b, u32 hash,
-				      unsigned long flags)
+static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags)
 {
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-	raw_spin_unlock(&b->raw_lock);
-	__this_cpu_dec(*(htab->map_locked[hash]));
-	local_irq_restore(flags);
-	preempt_enable();
+	raw_res_spin_unlock_irqrestore(&b->raw_lock, flags);
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	struct bpf_htab *htab;
-	int err, i;
+	int err;
 
 	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
 	if (percpu_lru) {
@@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab->buckets)
 		goto free_elem_count;
 
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
-		htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
-							   sizeof(int),
-							   sizeof(int),
-							   GFP_USER);
-		if (!htab->map_locked[i])
-			goto free_map_locked;
-	}
-
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
 	else
@@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
 	bpf_mem_alloc_destroy(&htab->ma);
 free_elem_count:
 	bpf_map_free_elem_count(&htab->map);
 free_htab:
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
@@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	b = __select_bucket(htab, tgt_l->hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return false;
 
@@ -829,7 +795,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 			break;
 		}
 
-	htab_unlock_bucket(htab, b, tgt_l->hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	return l == tgt_l;
 }
@@ -1148,7 +1114,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		 */
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1199,7 +1165,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 			check_and_free_fields(htab, l_old);
 		}
 	}
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l_old) {
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
@@ -1208,7 +1174,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 	return 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1255,7 +1221,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	copy_map_value(&htab->map,
 		       l_new->key + round_up(map->key_size, 8), value);
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1276,7 +1242,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	ret = 0;
 
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 err_lock_bucket:
 	if (ret)
@@ -1313,7 +1279,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1338,7 +1304,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1379,7 +1345,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 			return -ENOMEM;
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1403,7 +1369,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 err_lock_bucket:
 	if (l_new) {
 		bpf_map_dec_elem_count(&htab->map);
@@ -1445,7 +1411,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1455,7 +1421,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l)
 		free_htab_elem(htab, l);
@@ -1481,7 +1447,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1492,7 +1458,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l)
 		htab_lru_push_free(htab, l);
 	return ret;
@@ -1561,7 +1527,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map)
 static void htab_map_free(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	int i;
 
 	/* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback.
 	 * bpf_free_used_maps() is called after bpf prog is no longer executing.
@@ -1586,9 +1551,6 @@ static void htab_map_free(struct bpf_map *map)
 	bpf_mem_alloc_destroy(&htab->ma);
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 }
 
@@ -1631,7 +1593,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &bflags);
+	ret = htab_lock_bucket(b, &bflags);
 	if (ret)
 		return ret;
 
@@ -1669,7 +1631,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 			free_htab_elem(htab, l);
 	}
 
-	htab_unlock_bucket(htab, b, hash, bflags);
+	htab_unlock_bucket(b, bflags);
 
 	if (is_lru_map && l)
 		htab_lru_push_free(htab, l);
@@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	head = &b->head;
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
-		ret = htab_lock_bucket(htab, b, batch, &flags);
+		ret = htab_lock_bucket(b, &flags);
 		if (ret) {
 			rcu_read_unlock();
 			bpf_enable_instrumentation();
@@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		goto after_loop;
@@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		kvfree(keys);
@@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		dst_val += value_size;
 	}
 
-	htab_unlock_bucket(htab, b, batch, flags);
+	htab_unlock_bucket(b, flags);
 	locked = false;
 
 	while (node_to_free) {

From patchwork Tue Jan  7 14:00:00 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928954
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C1FF1F428A;
	Tue,  7 Jan 2025 14:00:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258445; cv=none;
 b=SUSzdrhcEXPWuEvfOCAW9b1IlPYxCmp/Ngcudam5/xFqYBjg7j7vTe0fJBVu1CLzUYRdGh4lRIMJAa2zqJpg4EkRWdtG9lxcU7/iqSfvCQEfxFUJMwazeATmeFzJiTTCyxaQk+Xhadny22S+I6NcXnJzxYehLcnYG7UTTNxbC7o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258445; c=relaxed/simple;
	bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=G8fgEoq6pb4ayRw9fZQOe7HTPQ+nxkSRVb/BCHZABNg8KlC/OjmNLB2v4Ty8TFVAqOoDt6hbf1JVySB9qY+rrbun7UQSm3o9fAWAqtIjHT518u7ZmGl+4VIddLGpE+kLdvsooFy08ntUoi3eo8HqGsUdYN55577ZpdH/gAoZ+eI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=DfD/2ctA; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="DfD/2ctA"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-436341f575fso160517015e9.1;
        Tue, 07 Jan 2025 06:00:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258437; x=1736863237;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=;
        b=DfD/2ctAQMCUBY2QmauxoOXcsXioJ2Fu6jaQ8vO6nwwZpdqNSzESXqZQXlD51tcxxO
         xCNG0lnLws4x0BQ8O1iLrSyszFXRlrvZNlAz4OQQTNxHn07JuB8TnixGa8pM71Zg0Mj1
         LI+ca2D8zLkEfpEdbOsRvfrStjJQ4IKuyacl54mesrKDnaf+GZxJJWA+hJR9vgfPAmmF
         QkIHw9cdd9oVzog3+ZqhhMvLeznzrfASaNguBvFVtvZ7vZ+Xv7nt/XRO7wtIsE1nKJJz
         GKECryuBVhi66B0IkrrUNbimAOCDNVKhGGEnXR6rmyMlrv6a0mTclVtLsfxvITRJ2uFo
         +N0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258437; x=1736863237;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=;
        b=MKxUcBHLnGgOXzIpu7R7A6iSHizknzwv4C1/c0pmTgS0jz1UYiP1lHpncvvYULy/WL
         hpWeWdmCWCnOp3rmWwW/facSMz6H1073HTKgxp/sBkk0y5QhGGlsAjlRIk1Q112iXJrg
         Ag7q2e1+8nKxbrTU7tue4u6dEJVArhs/p2sXoB9XLUZEeuZvvHMutP7l9UKhg7HMGt8C
         oYJfxRX8JoIVKX/ejMnHWgnI3VkQXrViXWVBeu9HHwm5ywHxXM1CGQKmvacKXOltW8Ua
         /B/3+OZjyOjWP3G/UuoAZfFFFnPFozD21GU1mQX7RrradsS3Gr95QAOsEhcetz9qjhU1
         ZkdA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVuOSr4Xbks2e/9n1F45vC6W8jWDmIsRNlpP1BZXPznmJRTh1/4DKIqFmkKhq65fBYSjT1b3wnQgKGHQmE=@vger.kernel.org
X-Gm-Message-State: AOJu0YwJQgkhUzgb2TxY5BpVeOBXWEV1lbNytnpMUU+fdaBdpzQKs5Mr
	H4EAlh/2qhNk9gq1YIx+BuWMs8/8ETOqrP+6BQ47nLHmjZr/Ciyz6ziP76rF2AQAlw==
X-Gm-Gg: ASbGnctxsGTRRBX6fmdcbUg3rReFqSrUkF4QzW0xmEFGD6mpEwecKieOBDHW6dgezbN
	6WgG3egVWV3z5KFC7bZQau7Demb+ZdAedtNguSgPfxLBFAvBwkSZa2ki8fivt2KF9rckkkS7ip9
	Rl185B0wen6AFYqelAjgaMuwxc/KGt9wWexlmeXeLD5sQVQ6IZqMKuByR6DaoETsI7uc3m+iZbn
	590/KExl2CSZa8bhPi5uTu+ZJmeFPZ3MFVsBtg9DQd4g0E=
X-Google-Smtp-Source: 
 AGHT+IEQ3ZfxC25vxuUk/xs/lMpAQUDW63O7gmVkKXnpPZfubVg6BOGgSV5Gv2VztpvJgbvK9QHoRw==
X-Received: by 2002:a5d:6f16:0:b0:382:40ad:44b2 with SMTP id
 ffacd0b85a97d-38a221fadadmr53312179f8f.34.1736258436684;
        Tue, 07 Jan 2025 06:00:36 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:71::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c6ad3e3sm52182531f8f.0.2025.01.07.06.00.36
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:36 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 18/22] bpf: Convert percpu_freelist.c to
 rqspinlock
Date: Tue,  7 Jan 2025 06:00:00 -0800
Message-ID: <20250107140004.2732830-19-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6512; h=from:subject;
 bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCe4A+EBJCWxvZBf7f2Rwf9fNAzPFNCmT/crlsa
 NhTJVJuJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8Ryr/gD/
 9YHT84xt8qyZ4dr56/EVDHUkVPJCu799UqHZ1pcZPBMZwTNzhSAiQrLzKW1JlyTLCgla83f8gqaUV4
 QEUJSZxKW2yHxC6WD4r+jw8A+LP8pFuna0qK9/yj9jRiepOMENZ5VignxQSbXFitPMoFUdo0ADmlJc
 w9DgDgjSQmCDKxf6fpw1ekgRoo396piBStwXAFxuDmdkmjqqPxZoloiAW0tzafxL2efzeH6Txzln2C
 VvcPewnZKdi3niudFjaqG6y8xmCQbAAHeCe1lKNtyYE8IXBW56vd9R250ew7/GAisZc7JMSsvfGuEX
 4fbCL/MOIPG0lWSW9+WOoSR81/vw9krSUNbXztcLYKPDvM8JdNUj0nggX8t6OM0PZxSLI6k74yh/n3
 jsCxSK+VRpfBKaIuU7s+vCawxvRhELhWUCq0utcBwORN1OBqx4l51XeDRbz1WHA79CFi/VRS7bogfK
 2iKpHvbccWnzJIas4tFDz5C1xgz2dSbFNcfpH+U67AT7rtUZYF8tbKWud61rZHrouv9W7CByAK4JYP
 cBgirYoOl22giKX/jhGTVP5snxyGNpPLlcQIepHcU6EhUnpotjk0hs9+jjCw8x2Z16RtTS7NRjrDSz
 6WIdNfuLu28kMtFM3/QNQFiJqglB6LbtMw1dqHxK30LBhpLvlc/GBlCF4h0A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert the percpu_freelist.c code to use rqspinlock, and remove the
extralist fallback and trylock-based acquisitions to avoid deadlocks.

Key thing to note is the retained while (true) loop to search through
other CPUs when failing to push a node due to locking errors. This
retains the behavior of the old code, where it would keep trying until
it would be able to successfully push the node back into the freelist of
a CPU.

Technically, we should start iteration for this loop from
raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus,
we skip execution in the loop body instead.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/percpu_freelist.c | 113 ++++++++---------------------------
 kernel/bpf/percpu_freelist.h |   4 +-
 2 files changed, 27 insertions(+), 90 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 034cf87b54e9..632762b57299 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s)
 	for_each_possible_cpu(cpu) {
 		struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu);
 
-		raw_spin_lock_init(&head->lock);
+		raw_res_spin_lock_init(&head->lock);
 		head->first = NULL;
 	}
-	raw_spin_lock_init(&s->extralist.lock);
-	s->extralist.first = NULL;
 	return 0;
 }
 
@@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
 	WRITE_ONCE(head->first, node);
 }
 
-static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head,
 					 struct pcpu_freelist_node *node)
 {
-	raw_spin_lock(&head->lock);
-	pcpu_freelist_push_node(head, node);
-	raw_spin_unlock(&head->lock);
-}
-
-static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s,
-						struct pcpu_freelist_node *node)
-{
-	if (!raw_spin_trylock(&s->extralist.lock))
+	if (raw_res_spin_lock(&head->lock))
 		return false;
-
-	pcpu_freelist_push_node(&s->extralist, node);
-	raw_spin_unlock(&s->extralist.lock);
+	pcpu_freelist_push_node(head, node);
+	raw_res_spin_unlock(&head->lock);
 	return true;
 }
 
-static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
-					     struct pcpu_freelist_node *node)
+void __pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
 {
-	int cpu, orig_cpu;
+	struct pcpu_freelist_head *head;
+	int cpu;
 
-	orig_cpu = raw_smp_processor_id();
-	while (1) {
-		for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
-			struct pcpu_freelist_head *head;
+	if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node))
+		return;
 
+	while (true) {
+		for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
+			if (cpu == raw_smp_processor_id())
+				continue;
 			head = per_cpu_ptr(s->freelist, cpu);
-			if (raw_spin_trylock(&head->lock)) {
-				pcpu_freelist_push_node(head, node);
-				raw_spin_unlock(&head->lock);
-				return;
-			}
-		}
-
-		/* cannot lock any per cpu lock, try extralist */
-		if (pcpu_freelist_try_push_extra(s, node))
+			if (raw_res_spin_lock(&head->lock))
+				continue;
+			pcpu_freelist_push_node(head, node);
+			raw_res_spin_unlock(&head->lock);
 			return;
+		}
 	}
 }
 
-void __pcpu_freelist_push(struct pcpu_freelist *s,
-			struct pcpu_freelist_node *node)
-{
-	if (in_nmi())
-		___pcpu_freelist_push_nmi(s, node);
-	else
-		___pcpu_freelist_push(this_cpu_ptr(s->freelist), node);
-}
-
 void pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
@@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 
 static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
 {
+	struct pcpu_freelist_node *node = NULL;
 	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
 	int cpu;
 
 	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
 		head = per_cpu_ptr(s->freelist, cpu);
 		if (!READ_ONCE(head->first))
 			continue;
-		raw_spin_lock(&head->lock);
+		if (raw_res_spin_lock(&head->lock))
+			continue;
 		node = head->first;
 		if (node) {
 			WRITE_ONCE(head->first, node->next);
-			raw_spin_unlock(&head->lock);
+			raw_res_spin_unlock(&head->lock);
 			return node;
 		}
-		raw_spin_unlock(&head->lock);
+		raw_res_spin_unlock(&head->lock);
 	}
-
-	/* per cpu lists are all empty, try extralist */
-	if (!READ_ONCE(s->extralist.first))
-		return NULL;
-	raw_spin_lock(&s->extralist.lock);
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
-	return node;
-}
-
-static struct pcpu_freelist_node *
-___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
-{
-	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
-	int cpu;
-
-	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
-		head = per_cpu_ptr(s->freelist, cpu);
-		if (!READ_ONCE(head->first))
-			continue;
-		if (raw_spin_trylock(&head->lock)) {
-			node = head->first;
-			if (node) {
-				WRITE_ONCE(head->first, node->next);
-				raw_spin_unlock(&head->lock);
-				return node;
-			}
-			raw_spin_unlock(&head->lock);
-		}
-	}
-
-	/* cannot pop from per cpu lists, try extralist */
-	if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock))
-		return NULL;
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
 	return node;
 }
 
 struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
-	if (in_nmi())
-		return ___pcpu_freelist_pop_nmi(s);
 	return ___pcpu_freelist_pop(s);
 }
 
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3c76553cfe57..914798b74967 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -5,15 +5,15 @@
 #define __PERCPU_FREELIST_H__
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <asm/rqspinlock.h>
 
 struct pcpu_freelist_head {
 	struct pcpu_freelist_node *first;
-	raw_spinlock_t lock;
+	rqspinlock_t lock;
 };
 
 struct pcpu_freelist {
 	struct pcpu_freelist_head __percpu *freelist;
-	struct pcpu_freelist_head extralist;
 };
 
 struct pcpu_freelist_node {

From patchwork Tue Jan  7 14:00:01 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928956
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5FA91F3D5E;
	Tue,  7 Jan 2025 14:00:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258447; cv=none;
 b=FEnDxmJGnQAeHHKMFBfH5fg9O4cShJkQb6FZJV0rG5IrupEenw0WL3quWjZg4IQpOKim1hbQ129mosfTmSSsU5lxH0rNjI8vbkfoEEWe1MVfUVyqQorCOi++NncGpVk9TRsZ3uWdz9cGawqjZ3DTqcjS0bZZm2/iTmnNmYFb7qU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258447; c=relaxed/simple;
	bh=BnUg1f24H681x4s6tZsYG9ygsb1F2ahmvkZk5xJQiW4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=CjXzy234y7XaNAFL52AL8+ZWBdm+qkr/hQst31rmaXr/EFRgXtyB6KBzwqn0VnIjF3Ebk8iltsWohKUOd9iikmCZYOB2Ejl3qhq18gnV81KeL0lYT+MkqD4ElgUvdyPBRHv8kjebj9J25F1ZHB5RauVagMvzOkJZrj5hH2Pe13g=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=b+3rddgo; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="b+3rddgo"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-4368a293339so123361385e9.3;
        Tue, 07 Jan 2025 06:00:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258439; x=1736863239;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=O7YmiHBR09Kpn1D1jyqv5Fld/aOBGo7wbnsSpE/rATA=;
        b=b+3rddgodN4+0mVKvpA8BwCa5xroQ2WaF/eBF/LU+vWeqkGwX2/UZg/lMhwLvZUg4z
         neWea1L+snN7voQx0ZkitYX6qpc+YU9aBKE8u+eqTiO0EBnBW1HrMavmCIr/av9tbq8y
         78qvJS7TkYBjdGQCtGYcsz0Wp28g7tbnYQQS7KiqW8BvL17VbLPxR6hcHNP+gbUnh2Pw
         slH+P3Ur/P05qlTvPvaH/Nw7Ei8rD569RAXYqBVhLsyxDxX0ShIZ+VnATksWmn7tKmW9
         O+UeFZN1v9gNHEEtVbO5fdz5IAB5ez38v7dfAg2evrEOuvs2Q1MFH2t+JyB9y05EqiMb
         LXgA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258439; x=1736863239;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=O7YmiHBR09Kpn1D1jyqv5Fld/aOBGo7wbnsSpE/rATA=;
        b=ENIQqyUoZUDhC6yqG3+rdCHWp94aqSSpSZhrWJomaL5znd0lQT1UcWDnGj3eAGYgU4
         SyD6z7oVo2NKjBuTYlqIOoDxEK3KCP6zn0N3OcjbIojpzQRDRNN1otHhOZIRHv3wi53o
         I0rUQ1J8WQQ3MIst9z0lg2ys3EBVp5uFVOXO7jOoaS4lWDbEBZb58yZMj7ZHPW4wHQQB
         cdJXRqydw9PByVaUqkVW10e7aR8DXFXhGdBBMpagVCmDVCb0pSWsq0JWCiUrciSSBT8r
         BuD6+pxl7fbAZKp5biiuTxKsqvXoT2d7h/t4D/RLy742S+VTTDsPZ8zUKGSVM8Kx0t1o
         8o1A==
X-Forwarded-Encrypted: i=1;
 AJvYcCW4jJml6ITeA/BJaME3vyrsHIFigpffB4/PGDQlyKy/VKCplxjKWTYWLwMw28W4AtWJFDHVxUsKzdSousc=@vger.kernel.org
X-Gm-Message-State: AOJu0Yzs7SXd5kBfSitKNu8QUE0+H9GTaguhO1x3dPUGQHhWR8BIBKcc
	X6SdmuTrjyF/8uawqBfG37M+LVmFAeD5Ie14pGgwy+juNZOzgA70sxzEQ+6DAKNdIw==
X-Gm-Gg: ASbGnctv88t7KuDs0//yGGvCj6EMEL0zJ+CEV1+zu5A/OuBkcwgZYG0ZxMR4KPfybtg
	GZwz9kRwgBC+kVSx643TkVzy3su4E6ruSjiaKjBSrMQtlcPlAgjFlA0Xx+p2pC6muLE2Hszkxc0
	d+ZdRTpgnybvvv8LjgO7DsJmMnDxRwq4zepG/MQCILmlHZYFfiOhvDn4lqYYdYpgzdijTCuz2Mh
	iykByvEh/IsjaSlfnXdcos62T+CLbNc+Rq36bU5t1k++4s=
X-Google-Smtp-Source: 
 AGHT+IGZyPa9R33aq2JpOxc9kugP+dU++zJ66alv9d3hN8OpcZQ1LjBlVx6RK9mlqXXXNRO+WdmQnQ==
X-Received: by 2002:a05:600c:3596:b0:434:f9c4:a850 with SMTP id
 5b1f17b1804b1-4366864408emr670578235e9.10.1736258438463;
        Tue, 07 Jan 2025 06:00:38 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:16::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38a1c832e8asm49803208f8f.37.2025.01.07.06.00.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:37 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 19/22] bpf: Convert lpm_trie.c to rqspinlock
Date: Tue,  7 Jan 2025 06:00:01 -0800
Message-ID: <20250107140004.2732830-20-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3743; h=from:subject;
 bh=BnUg1f24H681x4s6tZsYG9ygsb1F2ahmvkZk5xJQiW4=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCegcGW0MW2yG0ktCYIRed6zs2NCP/iPVubtbeM
 TBWdKC2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wngAKCRBM4MiGSL8RytxnEA
 CRugYoDt1TViRjiWaOmnYg7F0TMOIiNLOFNXg2c69baFonjpx8Wu1QeZ92fhOhDtKLncj20f9ifM+j
 meCI78Y2zitLQx6BcUO4pN/RFJoKqnVVt0R5/7P0UtPCo44jgLjaH9bca8IGYF4iv4740H0PJ4w0sW
 ImDBP5D2gbuGIv2lnw6bqJ/IS2QU20Y0Qqqk3naXvbrfzr6Wt+JJIJHRR2QNCf90AXBTB3s4hdUZQs
 K90JLK1Xo9ohmAAqa6hHcr57mfA4CMgRN6jOLv1/wxBLInZjemt77nrTsKfwoSX1FBdlpUPCBkKiVL
 aUsAyZSFjTLhd+TvsOtIuhAqaFXrwJH5kKpTK/P9rOC+rETUWnuO5D0nmA0m6YnAv0Pxesccns9T4Z
 SMDY3NBTiMd/Vwzyz3AVHADnCHIIfCX4AU1hiUc8nHcnQfls4IRqedIWnz3CnagBMPYCPXyuRMDmv0
 pAdW7ZOoputWmr53st5fAIjsb8yXFihBmq9LB2ttBZSSgi/H71K5CbTpHj1mDKrv1Zyx/cL6CZhfc6
 8PHxvWPQJOzKfOhlObcBkOK7865u6jqv48zncKgL5u4eMayuq1WoLTSjw6YBtRgyZbCWPHDG2MD+Pq
 gayB+QEIerRZscc3gjw4CvMmgI3QBaDDdJ9bwFPBrqRiaZH6FdMWVnx5qccA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert all LPM trie usage of raw_spinlock to rqspinlock.

Note that rcu_dereference_protected in trie_delete_elem is switched over
to plain rcu_dereference, the RCU read lock should be held from BPF
program side or eBPF syscall path, and the trie->lock is just acquired
before the dereference. It is not clear the reason the protected variant
was used from the commit history, but the above reasoning makes sense so
switch over.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/lpm_trie.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index f8bc1e096182..a92d1eeafb33 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -15,6 +15,7 @@
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
+#include <asm/rqspinlock.h>
 #include <linux/bpf_mem_alloc.h>
 
 /* Intermediate node */
@@ -36,7 +37,7 @@ struct lpm_trie {
 	size_t				n_entries;
 	size_t				max_prefixlen;
 	size_t				data_size;
-	raw_spinlock_t			lock;
+	rqspinlock_t			lock;
 };
 
 /* This trie implements a longest prefix match algorithm that can be used to
@@ -349,7 +350,9 @@ static long trie_update_elem(struct bpf_map *map,
 	if (!new_node)
 		return -ENOMEM;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		goto out_free;
 
 	new_node->prefixlen = key->prefixlen;
 	RCU_INIT_POINTER(new_node->child[0], NULL);
@@ -363,8 +366,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference_protected(*slot,
-					lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*slot))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -450,8 +452,8 @@ static long trie_update_elem(struct bpf_map *map,
 	rcu_assign_pointer(*slot, im_node);
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
-
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
+out_free:
 	migrate_disable();
 	if (ret)
 		bpf_mem_cache_free(&trie->ma, new_node);
@@ -477,7 +479,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	if (key->prefixlen > trie->max_prefixlen)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		return ret;
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of the path we traverse.  We will need to know the node
@@ -488,8 +492,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference_protected(
-		       *trim, lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*trim))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -553,7 +556,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	free_node = node;
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
 	migrate_disable();
 	bpf_mem_cache_free_rcu(&trie->ma, free_parent);
@@ -604,7 +607,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 			  offsetof(struct bpf_lpm_trie_key_u8, data);
 	trie->max_prefixlen = trie->data_size * 8;
 
-	raw_spin_lock_init(&trie->lock);
+	raw_res_spin_lock_init(&trie->lock);
 
 	/* Allocate intermediate and leaf nodes from the same allocator */
 	leaf_size = sizeof(struct lpm_trie_node) + trie->data_size +

From patchwork Tue Jan  7 14:00:02 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928957
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63E4B1F471A;
	Tue,  7 Jan 2025 14:00:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258448; cv=none;
 b=tzy2eXXR0+Hzs+T/pFCmGeU7OICUZJM5bW6qdMTZO+hFWHyiplqaeI1pub0CkrSu4qW7JejNONhiY0PlxCJrLLun0PrIpk3oFaN287erQlIXEgiTyrXxxTC4673U66jiuSnnT6sH40rR6yYDuv51gWc5BrP9kwbojPqafG1ZwoA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258448; c=relaxed/simple;
	bh=R4ohJqpzn8YJWRCAj2R17B9DlYMHmqsB4nDNq+DDEHM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=NEDZrC+JpngNm79l8cVQxzdbtDRyKkQdUEHXIvhcCPxrs/MhWFCDbLczT1andvwS2UeScE743XuBfMzMmK/5gEQTnIadazWAeflJtyUR5dwx5SIg2t/PTszUtZmNBlk2aieeBxWxvtgBQpWAKVB7iTDi6AZx0eH7RVHvMlhy7UE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=RhCc8Wz6; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="RhCc8Wz6"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-4361f65ca01so154708475e9.1;
        Tue, 07 Jan 2025 06:00:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258440; x=1736863240;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iSIQwf/xhn+Vvk/yBtHoeOyyV8hDyNDuOxAfVy3+XQs=;
        b=RhCc8Wz6IYewMHV2YEGco9SYuuFS6Q58Zt5okoPM4cAPNG7O/iCIWzVIhJOMaZ2fFi
         l5zPS/DpYFpuTM51MWj6dn9KNYBKjLe3SefpUbLL6oalzQDs5+nLHohi99RDpJsw0Uwb
         TO+UzL4BG+VbNQM0vGqbhRdYC8L3Lb2q8qTS1hk/bkW9X5oYRB2h5lM3WliREHyyCgrr
         L9uoG/5Et6uPmKzNL6tL0rvW7V+1DckTcqQUrPE2cJJnsUY6ExYG03HRHg80CpK2GrEt
         7Nvi84cHrSJ3OU39cE3erOrhoyk7+IAvM5OgJmAXqYdUuAycQPtjWjUNzdHdd8QlKzMn
         zfoQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258440; x=1736863240;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=iSIQwf/xhn+Vvk/yBtHoeOyyV8hDyNDuOxAfVy3+XQs=;
        b=FIKSrVzJKAZH9HcGPxyQvOsbTHIBj0GqFXXnbOjWSwzGEunVvQvAAFtbGRYntktL5n
         nun7/WhsL4hNtAwtpYnD84ONAEkMZkxm5986rFMH9FfzV4dQUhTBPxhejo/LZKFWHIBz
         vfY25hnJlvcpx4v0cgmt9we3SiaKTRAdC5Pgj+1p8KeAKl4QRKBkaCvyzwWq04bpAkE1
         NFEp19S1UTOU+zLbYO9ZCGzKIDjfHYf2ePglqq1cruxghsjlvq8hKdAO++zPLJL5D6dc
         5bPLNpHiQsWew2t2FTvJpONL0HAlOgL/yNGMC2/ZLZ7dF8/LQTS65vrwkhKS0dSwXKWv
         UugA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWaTnK4WwRwvycc5G22weg2TaJXUmkZqtR0GS+6vSOCIaiM/5YsMBk10xWXAJ3KifwDKRIBZerRPkarbx0=@vger.kernel.org
X-Gm-Message-State: AOJu0YzkH6O6zTKScfIfBa6mf+msyCq2X+7TRUhfdFfBEpjNFKAmjyGK
	AP5fAX0LveFgSUgVEFAueYIwxhc1KKm+UHS++3VZ7WOF4Xab8Rdl6ERB0XjGMmc68A==
X-Gm-Gg: ASbGncvDJAeH4ZzuoNqh+21iZuO1ErXn0sYYwJajl1gNEJ6Q+Nsk4XUlf5iEremcgeR
	QQWt+W6QlAH5MW8bWglTrCN0YiorQWXZYiSdX2xfyCbKKnCToCNIUEMNnIJ3fb2l/bBFkGXlNgw
	1L3rSZUxVaVVBv3yTcWTYW9RggBbnwNKngsPRjiXePTWLBlM+u1vlBvhmvcrp0vTI8HT80M1vM4
	RZpTuMSfrPOSLa0djQ3/wjl77YAG2eVGz08howW/eAy094=
X-Google-Smtp-Source: 
 AGHT+IECxlWOQbSIVzLGSd31u9XGgCXmrBRi9x7P8KEqDg/cQeUwykViQ+WvGEJocWP+HmKZNwlncw==
X-Received: by 2002:a05:600c:1550:b0:434:f99e:a5b5 with SMTP id
 5b1f17b1804b1-43668b61bcemr421443875e9.28.1736258440036;
        Tue, 07 Jan 2025 06:00:40 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:17::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43661289995sm596240445e9.36.2025.01.07.06.00.39
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:39 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
Date: Tue,  7 Jan 2025 06:00:02 -0800
Message-ID: <20250107140004.2732830-21-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4948; h=from:subject;
 bh=R4ohJqpzn8YJWRCAj2R17B9DlYMHmqsB4nDNq+DDEHM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCfj+uG1kLqJyRw6f0+JDV22mvVEJTdDvXhAWLn
 M/WMKreJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnwAKCRBM4MiGSL8RypFYEA
 C400zC2/p8imWhavcvlPv7jdv2ocx1KY2ZDyUSBf0O+vMarzEI1EcG+AlLHQMvU3bJYwr5Hfpati6R
 1gMfFNXm6DgfiWiq0q/xKxupv2eHErTckyrjai1ibDedm7eCTsS5H6h6jlWSvoACAcuAkBXbWS/TDE
 r9lHqm8Ml6jkQZVjfjJjGZokB3RRekr72+Y+hC5eGKj6ABxPysfAkn+I43hpxRfjyLxA7qDtDiU5gU
 SIxzzznRKkvH+J3zJe6fDI1YdRjDRjS1WRQgt0NaSn1r+bilNUH4OAEeA7KFENV0FbFHWxoGXdWb5j
 YZUl3+jR3BH0XIqhQeOjJellg0/mY9SgmEEyRYCzqPWixudUhDCcDkQBdR4GSs115CAFQ/cFTb3njD
 YDnzhZ04eduJH/jJSkqdSknr89Y7T8a8bVak4fmyY+SXS8sPbq85Jp5cP1X2fvne9lTD8LCTmlhiE9
 zcb2Qe8T2qVttPL19D/SKJILQVQFGnC0qwoT3zuWS1k/FDdIRScQzOMTf7k8+4v9X9vBSQLR0xzf85
 v/gl7fjEa+UplSQ2tG0i3F5UlaYfB/D1hdJVCi+qiLqosjOLTUZ4zvYPYjHSH3ezAJSbJk9+PTXu6S
 eF4TgYVsUVgKL3yekIlWWAQKagRP324AlvRkl+gXFMTPdaQeVVznexlAESQA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock,
and their irqsave/irqrestore variants, which wrap the rqspinlock APIs.
bpf_res_spin_lock returns a conditional result, depending on whether the
lock was acquired (NULL is returned when lock acquisition succeeds,
non-NULL upon failure). The memory pointed to by the returned pointer
upon failure can be dereferenced after the NULL check to obtain the
error code.

Instead of using the old bpf_spin_lock type, introduce a new type with
the same layout, and the same alignment, but a different name to avoid
type confusion.

Preemption is disabled upon successful lock acquisition, however IRQs
are not. Special kfuncs can be introduced later to allow disabling IRQs
when taking a spin lock. Resilient locks are safe against AA deadlocks,
hence not disabling IRQs currently does not allow violation of kernel
safety.

__irq_flag annotation is used to accept IRQ flags for the IRQ-variants,
with the same semantics as existing bpf_local_irq_{save, restore}.

These kfuncs will require additional verifier-side support in subsequent
commits, to allow programs to hold multiple locks at the same time.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  4 ++
 include/linux/bpf.h              |  1 +
 kernel/locking/rqspinlock.c      | 78 ++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 53be8426373c..22f8770f033b 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -14,6 +14,10 @@
 #include <linux/percpu.h>
 #include <asm/qspinlock.h>
 
+struct bpf_res_spin_lock {
+	u32 val;
+};
+
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index feda0ce90f5a..f93a4f40aaaf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <asm/rqspinlock.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 9d3036f5e613..2c6293d1298c 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -15,6 +15,8 @@
 
 #include <linux/smp.h>
 #include <linux/bug.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -644,3 +646,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+__bpf_kfunc_start_defs();
+
+#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
+__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct qspinlock) != sizeof(struct bpf_res_spin_lock));
+	BUILD_BUG_ON(__alignof__(struct qspinlock) != __alignof__(struct bpf_res_spin_lock));
+
+	preempt_disable();
+	ret = res_spin_lock((struct qspinlock *)lock);
+	if (unlikely(ret)) {
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock)
+{
+	res_spin_unlock((struct qspinlock *)lock);
+	preempt_enable();
+}
+
+__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	local_irq_save(flags);
+	ret = res_spin_lock((struct qspinlock *)lock);
+	if (unlikely(ret)) {
+		local_irq_restore(flags);
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	*ptr = flags;
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	res_spin_unlock((struct qspinlock *)lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(rqspinlock_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock)
+BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore)
+BTF_KFUNCS_END(rqspinlock_kfunc_ids)
+
+static const struct btf_kfunc_id_set rqspinlock_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &rqspinlock_kfunc_ids,
+};
+
+static __init int rqspinlock_register_kfuncs(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set);
+}
+late_initcall(rqspinlock_register_kfuncs);

From patchwork Tue Jan  7 14:00:03 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928958
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A32B1F4273;
	Tue,  7 Jan 2025 14:00:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258452; cv=none;
 b=UDRIPnnRx6fT+GUMGPzbcWhbLx6v3R4fZBq7U4jSxOjK8pN3z46bET7yaupDjO338713xvVtXLzsSMQ26zEUmOp4qtAF0z/1pMJu+zaWCqADAHz2WDA/s4uUtbGTwlJC0uUS8J2YcNt2GT4NsDz/pZEl9g2+D4na5Pqfj9EQ0Jk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258452; c=relaxed/simple;
	bh=YumETHfw6wrlh+Bk4qX4yE5a5GND4mH/NOVkSjmOCBw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=MBuujcC9gi7PTWf9G0JS6tjbmS0prfYBEeVcEdeHycGSYFjE21q12+nZyNMErCfHep6UIAcAGyKdUeMMmfmoFXzAd1mKRxdcXcFxs9sZZYetRG3X8o23PVKFSWIRsH54A72HuYSUlARHODGFrXRTtrXGM3gN+udk1nk08w3pcLU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=YQ9cJyCn; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="YQ9cJyCn"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-436ce2ab251so26118735e9.1;
        Tue, 07 Jan 2025 06:00:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258442; x=1736863242;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=asqlADWnEA+4wqCgWZ57L1bfGFaYkno6Qz2eZ/6rUpc=;
        b=YQ9cJyCnQbhAPg+UG8sNlrmhCMovxOYpTGRBGTer7IGDzZkT25vO/Nc9Gp1a8I5bkV
         MZgej9CBet0wJONboqhQdWjcqJa9xS+JcxpCRV/RPjK/fx36nQ9T/0SIjSOLL/gBTE4c
         Cj1dTJBNvnx7Rl13L/DvoioZnc8nhC7pyEayRnAYCiJdItWV1MC9S1ZSKc9KT9Gt7ppm
         e+4D7N8XQr6TSvSkTOy2ao76gQrAhyWe9bcU04La5ECL+gOzfr8VjX+6+KHndXmtdiSg
         +yUwJP2tAwkunzXkcMPkZAfmWrct+71HoorGJdsEuymt6b3fmQyuaxdqqvgYLfYp3yda
         JL7w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258442; x=1736863242;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=asqlADWnEA+4wqCgWZ57L1bfGFaYkno6Qz2eZ/6rUpc=;
        b=K+AQBl4fU0PixIs6PLczsn6wbA5lRj5mvf/R3UhKJyzxYl3NXawkeQMhjoG1PV8Sjf
         Kg1pH56LT9HipUs6yQnsAp3knld3d0xVrtOvcWzcveOfTv7F+scwPEJOb0+hKfYu3ynG
         PVR4auJ/L/RrX8dZdAFrKzqBHp3MJ8Jh0h31CaHSwljJX+C3jkXQW+E/490L6lav0pHo
         w31u7Jkq685GZrPnNJBG7SDwM0zooFcL0uaAVuy6iWKbBTC+jKZYYrw82N1C/dP4Wosh
         bKNRUDDgh08qxOGA4YwC6avtxQaL7K61DIoMXNmyjQawxYyh2h2qkF/kU5nAMpYRfSOZ
         Zj/A==
X-Forwarded-Encrypted: i=1;
 AJvYcCUmjJi9PGRDnrjhEX09PLR4buLkk1UUv8RsgYScZhZB+VdOF0EFPiKDLdCIvpV04d9Adm7K2SK9f8wcMz4=@vger.kernel.org
X-Gm-Message-State: AOJu0YzDkyo3w7uiAmM3hw4NDJXhsgUuzw4NxemVDMJL7ffnrBZANENO
	cX7VfFGyx/U3XTkOUj6JmsUDx3rve+kpA3N8aFM2diwdm8SHDuCQrqMjTRLFsORbxA==
X-Gm-Gg: ASbGnctvFDPdS9AyxdnA+7qFVAbMTb+gkLYnPHUHhlnu2K4p2Tx2r3u92YPRh00yoRM
	O2Moa1jP/U4xnoxxa8UrK8IMJwMe5ea2rjkWDWfKrfS169AuLtDU2zqicpuegSEGizjwq2XE2n0
	AuNHifRM+BcEZSoLv8Z8gGrmV8w6AB/ipIlqp2p92Loc8XE4SCxMHmk6IGjTo03NmRzJkGsxroX
	9MD9lwSPCTOHAOwZsMK3Ol92kSRG0wsT6VH9brimG66/w==
X-Google-Smtp-Source: 
 AGHT+IGyllfF0Y50gDbbAxqMapeKd1uMt7SzeqVy8eiDP1iFMoigfrrz2FK0+MMAe1rVIDp+Fqvf4Q==
X-Received: by 2002:a05:600c:1f85:b0:434:9936:c823 with SMTP id
 5b1f17b1804b1-43668646741mr576645145e9.18.1736258441758;
        Tue, 07 Jan 2025 06:00:41 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:3::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43656b442dasm629967385e9.42.2025.01.07.06.00.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:40 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 21/22] bpf: Implement verifier support for
 rqspinlock
Date: Tue,  7 Jan 2025 06:00:03 -0800
Message-ID: <20250107140004.2732830-22-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=30683; h=from:subject;
 bh=YumETHfw6wrlh+Bk4qX4yE5a5GND4mH/NOVkSjmOCBw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCfBJ7SVumSPoVV68x9ZtKlPL5zj6CZBb7DL7R8
 JJ1WgtOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnwAKCRBM4MiGSL8RyjcxD/
 9atY1N5ObQekBkzcdFlgVjMg5wT7+G7Q/8pwyOT1L98SLOS+FF6CDkzF/iMtZylod00QBAIyEI3RvT
 9UQBg81/x9/qpocxEHJ3DWV9KOixmwVbDPw03MtvKnuree4lwgqNM0S5gGCElSo1a+5NP14l5vx6du
 ppAmas3NpRBvLK3c2XYi5ANWHiaZkiME0UJOiuvoopiLu+1Z5Y5pgWti38yRlRmmJMoyuQTL3bSa/n
 C1RDD6m4N6SthC9GAzSjplFHr7TRo2X2BspPHXG11n6N95VS5TSkkj7ofXJrLTp5F+KyF9gB5WAzs4
 JDOpYimnsjiPeiGEJ7U6uxvGf6SzGdOKLJ7UU9Oe/8veV+8iYyOeNVciiytUcyI+Wy4O7aPsKbZXL4
 JY14/5ofg/Rc4vtYnS/cHtVfApdbn2Pmy53+qqcclr9malkr9tGIyM3ZbdtKdnPwDBHDIGTTMj9IAX
 y3YiBd+Eo9AexBQ3BdNnmbNOuwCJUI9WdccrE8RxSEdvaWX+Dc+uucfziUALD47JaEsMMRLSxUAWjy
 ErTf0wXQYFoqzIu3CSxWoLAvfUXTybM0ArbVRDBwaYtoBSoTVeMQ40cTi0/IaqPzVyYgWHqcXhN4Q4
 JH+JdfYAShFx/P2a92XwqMgYPQ1FkOj1Eq9M2kFU41UoZ/iYvt8UhJARCljA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce verifier-side support for rqspinlock kfuncs. The first step is
allowing bpf_res_spin_lock type to be defined in map values and
allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
field to recognize and validate.

Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
one of them (and at most one of them per-object, like before) must be
present. The bpf_res_spin_lock can also be used to protect objects that
require lock protection for their kfuncs, like BPF rbtree and linked
list.

The verifier plumbing to simulate success and failure cases when calling
the kfuncs is done by pushing a new verifier state to the verifier state
stack which will verify the failure case upon calling the kfunc. The
path where success is indicated creates all lock reference state and IRQ
state (if necessary for irqsave variants). In the case of failure, all
state creation is skipped while verifying the kfunc. When marking the
return value for success case, the value is marked as 0, and for the
failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user
checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier
never traverses such branches for success cases, and would be aware that
the lock is not held in such cases.

We push the kfunc state in do_check and then call check_kfunc_call
separately for pushed state and the current state, and operate on the
current state in case of success, and skip adding lock and IRQ state in
case of failure. Failure state is indicated using PROCESS_LOCK_FAIL
flag.

We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs
with IRQ state created by bpf_local_irq_save.

With all this infrastructure, these kfuncs become usable in programs
while satisfying all safety properties required by the kernel.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |   9 ++
 include/linux/bpf_verifier.h |  17 ++-
 kernel/bpf/btf.c             |  26 +++-
 kernel/bpf/syscall.c         |   6 +-
 kernel/bpf/verifier.c        | 233 ++++++++++++++++++++++++++++-------
 5 files changed, 238 insertions(+), 53 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f93a4f40aaaf..fd05c13590e0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -205,6 +205,7 @@ enum btf_field_type {
 	BPF_REFCOUNT   = (1 << 9),
 	BPF_WORKQUEUE  = (1 << 10),
 	BPF_UPTR       = (1 << 11),
+	BPF_RES_SPIN_LOCK = (1 << 12),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -240,6 +241,7 @@ struct btf_record {
 	u32 cnt;
 	u32 field_mask;
 	int spin_lock_off;
+	int res_spin_lock_off;
 	int timer_off;
 	int wq_off;
 	int refcount_off;
@@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return "bpf_spin_lock";
+	case BPF_RES_SPIN_LOCK:
+		return "bpf_res_spin_lock";
 	case BPF_TIMER:
 		return "bpf_timer";
 	case BPF_WORKQUEUE:
@@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return sizeof(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return sizeof(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return sizeof(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return __alignof__(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return __alignof__(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return __alignof__(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_RB_ROOT:
 		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_KPTR_UNREF:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 32c23f2a3086..ed444e44f524 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -115,6 +115,15 @@ struct bpf_reg_state {
 			int depth:30;
 		} iter;
 
+		/* For irq stack slots */
+		struct {
+			enum {
+				IRQ_KFUNC_IGNORE,
+				IRQ_NATIVE_KFUNC,
+				IRQ_LOCK_KFUNC,
+			} kfunc_class;
+		} irq;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -255,9 +264,11 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR	= 1,
-		REF_TYPE_IRQ	= 2,
-		REF_TYPE_LOCK	= 3,
+		REF_TYPE_PTR		= (1 << 1),
+		REF_TYPE_IRQ		= (1 << 2),
+		REF_TYPE_LOCK		= (1 << 3),
+		REF_TYPE_RES_LOCK 	= (1 << 4),
+		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 8396ce1d0fba..99c9fdbdd31c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3477,6 +3477,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 			goto end;
 		}
 	}
+	if (field_mask & BPF_RES_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_res_spin_lock")) {
+			if (*seen_mask & BPF_RES_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_RES_SPIN_LOCK;
+			type = BPF_RES_SPIN_LOCK;
+			goto end;
+		}
+	}
 	if (field_mask & BPF_TIMER) {
 		if (!strcmp(name, "bpf_timer")) {
 			if (*seen_mask & BPF_TIMER)
@@ -3655,6 +3664,7 @@ static int btf_find_field_one(const struct btf *btf,
 
 	switch (field_type) {
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_LIST_NODE:
@@ -3948,6 +3958,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		return ERR_PTR(-ENOMEM);
 
 	rec->spin_lock_off = -EINVAL;
+	rec->res_spin_lock_off = -EINVAL;
 	rec->timer_off = -EINVAL;
 	rec->wq_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
@@ -3975,6 +3986,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			/* Cache offset for faster lookup at runtime */
 			rec->spin_lock_off = rec->fields[i].offset;
 			break;
+		case BPF_RES_SPIN_LOCK:
+			WARN_ON_ONCE(rec->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			rec->res_spin_lock_off = rec->fields[i].offset;
+			break;
 		case BPF_TIMER:
 			WARN_ON_ONCE(rec->timer_off >= 0);
 			/* Cache offset for faster lookup at runtime */
@@ -4018,9 +4034,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
+	if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	/* bpf_{list_head, rb_node} require bpf_spin_lock */
 	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
-	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
+	     btf_record_has_field(rec, BPF_RB_ROOT)) &&
+		 (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -5638,7 +5660,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
 						  BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT |
 						  BPF_KPTR, t->size);
 		/* The record cannot be unset, treat it as an error if so */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e88797fdbeb..9701212aa2ed 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 			break;
 		case BPF_TIMER:
 			bpf_timer_cancel_and_free(field_ptr);
@@ -1199,7 +1202,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
@@ -1218,6 +1221,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case 0:
 				continue;
 			case BPF_SPIN_LOCK:
+			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b8ca227c78af..bf230599d6f7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -455,7 +455,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
+	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1147,7 +1147,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id);
 
 static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 				     struct bpf_kfunc_call_arg_meta *meta,
-				     struct bpf_reg_state *reg, int insn_idx)
+				     struct bpf_reg_state *reg, int insn_idx,
+				     int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1169,6 +1170,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
 	st->live |= REG_LIVE_WRITTEN;
 	st->ref_obj_id = id;
+	st->irq.kfunc_class = kfunc_class;
 
 	for (i = 0; i < BPF_REG_SIZE; i++)
 		slot->slot_type[i] = STACK_IRQ_FLAG;
@@ -1177,7 +1179,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				      int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1191,6 +1194,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r
 	slot = &state->stack[spi];
 	st = &slot->spilled_ptr;
 
+	if (kfunc_class != IRQ_KFUNC_IGNORE && st->irq.kfunc_class != kfunc_class) {
+		const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+		const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+
+		verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n",
+			flag_kfunc, used_kfunc);
+		return -EINVAL;
+	}
+
 	err = release_irq_state(env->cur_state, st->ref_obj_id);
 	WARN_ON_ONCE(err && err != -EACCES);
 	if (err) {
@@ -1588,7 +1600,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type != type)
+		if (!(s->type & type))
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -7995,6 +8007,13 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
 	return err;
 }
 
+enum {
+	PROCESS_SPIN_LOCK = (1 << 0),
+	PROCESS_RES_LOCK  = (1 << 1),
+	PROCESS_LOCK_IRQ  = (1 << 2),
+	PROCESS_LOCK_FAIL = (1 << 3),
+};
+
 /* Implementation details:
  * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL.
  * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL.
@@ -8017,30 +8036,38 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
-static int process_spin_lock(struct bpf_verifier_env *env, int regno,
-			     bool is_lock)
+static int process_spin_lock(struct bpf_verifier_env *env, struct bpf_verifier_state *cur, int regno, int flags)
 {
+	bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK;
+	const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin";
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
-	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
+	bool is_irq = flags & PROCESS_LOCK_IRQ;
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
 	struct btf_record *rec;
+	u32 spin_lock_off;
 	int err;
 
+	/* If the spin lock acquisition failed, we don't process the argument. */
+	if (flags & PROCESS_LOCK_FAIL)
+		return 0;
+	/* Success case always operates on current state only. */
+	WARN_ON_ONCE(cur != env->cur_state);
+
 	if (!is_const) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s_lock has to be at the constant offset\n",
+			regno, lock_str);
 		return -EINVAL;
 	}
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		map = reg->map_ptr;
 		if (!map->btf) {
 			verbose(env,
-				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-				map->name);
+				"map '%s' has to have BTF in order to use %s_lock\n",
+				map->name, lock_str);
 			return -EINVAL;
 		}
 	} else {
@@ -8048,36 +8075,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	}
 
 	rec = reg_btf_record(reg);
-	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
-		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
-			map ? map->name : "kptr");
+	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr", lock_str);
 		return -EINVAL;
 	}
-	if (rec->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, rec->spin_lock_off);
+	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
+	if (spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
+			val + reg->off, lock_str, spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
 		else
 			ptr = btf;
 
-		if (cur->active_locks) {
-			verbose(env,
-				"Locking two bpf_spin_locks are not allowed\n");
-			return -EINVAL;
+		if (!is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
+				verbose(env,
+					"Locking two bpf_spin_locks are not allowed\n");
+				return -EINVAL;
+			}
+		} else if (is_res_lock) {
+			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
+				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
+				return -EINVAL;
+			}
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
 		}
 	} else {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
@@ -8085,12 +8129,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			ptr = btf;
 
 		if (!cur->active_locks) {
-			verbose(env, "bpf_spin_unlock without taking a lock\n");
+			verbose(env, "%s_unlock without taking a lock\n", lock_str);
 			return -EINVAL;
 		}
 
-		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
-			verbose(env, "bpf_spin_unlock of different lock\n");
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		if (release_lock_state(env->cur_state, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
 		}
 
@@ -9338,11 +9388,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
-			err = process_spin_lock(env, regno, true);
+			err = process_spin_lock(env, env->cur_state, regno, PROCESS_SPIN_LOCK);
 			if (err)
 				return err;
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
-			err = process_spin_lock(env, regno, false);
+			err = process_spin_lock(env, env->cur_state, regno, 0);
 			if (err)
 				return err;
 		} else {
@@ -11529,6 +11579,7 @@ enum {
 	KF_ARG_RB_ROOT_ID,
 	KF_ARG_RB_NODE_ID,
 	KF_ARG_WORKQUEUE_ID,
+	KF_ARG_RES_SPIN_LOCK_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -11538,6 +11589,7 @@ BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
 BTF_ID(struct, bpf_rb_node)
 BTF_ID(struct, bpf_wq)
+BTF_ID(struct, bpf_res_spin_lock)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -11586,6 +11638,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
 }
 
+static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID);
+}
+
 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
 				  const struct btf_param *arg)
 {
@@ -11657,6 +11714,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
+	KF_ARG_PTR_TO_RES_SPIN_LOCK,
 };
 
 enum special_kfunc_type {
@@ -11693,6 +11751,10 @@ enum special_kfunc_type {
 	KF_bpf_iter_num_new,
 	KF_bpf_iter_num_next,
 	KF_bpf_iter_num_destroy,
+	KF_bpf_res_spin_lock,
+	KF_bpf_res_spin_unlock,
+	KF_bpf_res_spin_lock_irqsave,
+	KF_bpf_res_spin_unlock_irqrestore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11771,6 +11833,10 @@ BTF_ID(func, bpf_local_irq_restore)
 BTF_ID(func, bpf_iter_num_new)
 BTF_ID(func, bpf_iter_num_next)
 BTF_ID(func, bpf_iter_num_destroy)
+BTF_ID(func, bpf_res_spin_lock)
+BTF_ID(func, bpf_res_spin_unlock)
+BTF_ID(func, bpf_res_spin_lock_irqsave)
+BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11864,6 +11930,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_IRQ_FLAG;
 
+	if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RES_SPIN_LOCK;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11967,22 +12036,34 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int process_irq_flag(struct bpf_verifier_env *env, int regno,
-			     struct bpf_kfunc_call_arg_meta *meta)
+static int process_irq_flag(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate, int regno,
+			    struct bpf_kfunc_call_arg_meta *meta, int flags)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int err, kfunc_class = IRQ_NATIVE_KFUNC;
 	bool irq_save;
-	int err;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) {
 		irq_save = true;
-	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+			kfunc_class = IRQ_LOCK_KFUNC;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] ||
+		   meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) {
 		irq_save = false;
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+			kfunc_class = IRQ_LOCK_KFUNC;
 	} else {
 		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 		return -EFAULT;
 	}
 
+	/* If the spin lock acquisition failed, we don't process the argument. */
+	if (kfunc_class == IRQ_LOCK_KFUNC && (flags & PROCESS_LOCK_FAIL))
+		return 0;
+	/* Success case always operates on current state only. */
+	WARN_ON_ONCE(vstate != env->cur_state);
+
 	if (irq_save) {
 		if (!is_irq_flag_reg_valid_uninit(env, reg)) {
 			verbose(env, "expected uninitialized irq flag as arg#%d\n", regno - 1);
@@ -11993,7 +12074,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class);
 		if (err)
 			return err;
 	} else {
@@ -12007,7 +12088,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = unmark_stack_slot_irq_flag(env, reg);
+		err = unmark_stack_slot_irq_flag(env, reg, kfunc_class);
 		if (err)
 			return err;
 	}
@@ -12134,7 +12215,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
+			    id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12170,9 +12252,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl];
 }
 
+static bool is_bpf_res_spin_lock_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore];
+}
+
 static bool kfunc_spin_allowed(u32 btf_id)
 {
-	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id);
+	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) ||
+	       is_bpf_res_spin_lock_kfunc(btf_id);
 }
 
 static bool is_sync_callback_calling_kfunc(u32 btf_id)
@@ -12431,8 +12522,9 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
 	}
 }
 
-static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
-			    int insn_idx)
+static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+			    struct bpf_kfunc_call_arg_meta *meta,
+			    int insn_idx, int arg_flags)
 {
 	const char *func_name = meta->func_name, *ref_tname;
 	const struct btf *btf = meta->btf;
@@ -12453,7 +12545,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
-		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
+		struct bpf_reg_state *regs = vstate->frame[vstate->curframe]->regs, *reg = &regs[i + 1];
 		const struct btf_type *t, *ref_t, *resolve_ret;
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1, ref_id, type_size;
@@ -12604,6 +12696,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -12898,11 +12991,33 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to an irq flag on stack\n", i);
 				return -EINVAL;
 			}
-			ret = process_irq_flag(env, regno, meta);
+			ret = process_irq_flag(env, vstate, regno, meta, arg_flags);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
+		{
+			int flags = PROCESS_RES_LOCK;
+
+			if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d doesn't point to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_bpf_res_spin_lock_kfunc(meta->func_id))
+				return -EFAULT;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+				flags |= PROCESS_SPIN_LOCK;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+				flags |= PROCESS_LOCK_IRQ;
+			ret = process_spin_lock(env, vstate, regno, flags | arg_flags);
 			if (ret < 0)
 				return ret;
 			break;
 		}
+		}
 	}
 
 	if (is_kfunc_release(meta) && !meta->release_regno) {
@@ -12958,12 +13073,11 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
 
 static int check_return_code(struct bpf_verifier_env *env, int regno, const char *reg_name);
 
-static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
-			    int *insn_idx_p)
+static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+			    struct bpf_insn *insn, int *insn_idx_p, int flags)
 {
 	bool sleepable, rcu_lock, rcu_unlock, preempt_disable, preempt_enable;
 	u32 i, nargs, ptr_type_id, release_ref_obj_id;
-	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
 	const struct btf_type *t, *ptr_type;
 	struct bpf_kfunc_call_arg_meta meta;
@@ -12971,8 +13085,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	const struct btf_type *ret_t;
+	struct bpf_reg_state *regs;
 	struct btf *desc_btf;
 
+	regs = vstate->frame[vstate->curframe]->regs;
+
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
 		return 0;
@@ -12999,7 +13116,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	}
 
 	/* Check the arguments */
-	err = check_kfunc_args(env, &meta, insn_idx);
+	err = check_kfunc_args(env, vstate, &meta, insn_idx, flags);
 	if (err < 0)
 		return err;
 
@@ -13157,6 +13274,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+		    meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+			if (flags & PROCESS_LOCK_FAIL)
+				__mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1);
+			else
+				__mark_reg_const_zero(env, &regs[BPF_REG_0]);
+		}
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
@@ -18040,7 +18164,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 		case STACK_IRQ_FLAG:
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
-			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class)
 				return false;
 			break;
 		case STACK_MISC:
@@ -18084,6 +18209,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
+		case REF_TYPE_RES_LOCK:
+		case REF_TYPE_RES_LOCK_IRQ:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
 				return false;
 			break;
@@ -19074,7 +19201,19 @@ static int do_check(struct bpf_verifier_env *env)
 				if (insn->src_reg == BPF_PSEUDO_CALL) {
 					err = check_func_call(env, insn, &env->insn_idx);
 				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
-					err = check_kfunc_call(env, insn, &env->insn_idx);
+					if (!insn->off &&
+					    (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] ||
+					     insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+						struct bpf_verifier_state *branch;
+
+						branch = push_stack(env, env->insn_idx + 1, env->prev_insn_idx, false);
+						if (!branch) {
+							verbose(env, "failed to push state for failed lock acquisition\n");
+							return -ENOMEM;
+						}
+						err = check_kfunc_call(env, branch, insn, &env->insn_idx, PROCESS_LOCK_FAIL);
+					}
+					err = err ?: check_kfunc_call(env, env->cur_state, insn, &env->insn_idx, 0);
 					if (!err && is_bpf_throw_kfunc(insn)) {
 						exception_exit = true;
 						goto process_bpf_exit_full;
@@ -19417,7 +19556,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
+	if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;

From patchwork Tue Jan  7 14:00:04 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13928959
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D68B91F3D48;
	Tue,  7 Jan 2025 14:00:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1736258453; cv=none;
 b=aK6yGF3acOhsTALzkVAGxI8Ym7XpKFcphuQy29Fek8PI+l4642R9WE7UfKldvMEol4n//yjj7kn/WH/PzYjldlJRdjcoUAa5RiWIDeaIaatu66VIvwwFaVz/C3l6uZ+XUUXaSWbxMBAJtV++dLfU9N38oenrC1FPpf4Da4dITFY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1736258453; c=relaxed/simple;
	bh=FBR0eFp2XW2HZA+qyWkxTYz5iZeScNdFDaNVHD4Qz3A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YWMYZ9PZm+vKg50iva5YTKLXprzTYmelLhulzsxvgUztACcEcnGSRpHbnQTy6kwmnJ2EqslkylzZ2t7gJ0s1MsuVLm7A5ItsaBBJusCtENIu+IkHA8IYjxNbhZ8N0YpicbOmChbXw1D8+3KgEIrOiKYa8yzWhI1jooroQauhQSk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=eZj6Ed9z; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="eZj6Ed9z"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-4362bae4d7dso111559715e9.1;
        Tue, 07 Jan 2025 06:00:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736258444; x=1736863244;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=zo4OiIkRFhOWBPmum8Y5hXiAQUzWDaGmQz5kzCgn6+I=;
        b=eZj6Ed9ze5wqlBmo7IFIPsP+V9iob/0SSFqUxvCs1A1khh4bBc0cQvU63zfEPYJSaK
         zTWMVHZY3QoLk0U6s6ZmPLrogbaMxqeO90wOz57rr7VaDsMZpchKrECNhCsUE/3xlKGK
         uV4YYNUFcOUj9/75SHD88QlqXmcBKR/UFeh2dN223U4rI0XUuLGTW+m35mgUzrasKFIk
         OM/U3ykywP12T4NlR6S28jZCZbEe2KRz9Mz6hoNK1oQKubjxBWXEgWNM1uW44hAhaTlj
         wZGcnDC7e3SxVC+e7xjhvI7cVv4L3uXqpr/Hztr1HakXNBCs+DwqwoKsNx/rjeHmr6SL
         nicQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736258444; x=1736863244;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=zo4OiIkRFhOWBPmum8Y5hXiAQUzWDaGmQz5kzCgn6+I=;
        b=borKHVY248QocQdli0HlbucjqWR7jc8/Axv4ytFti1U1Wo3F9tHTzB+FYTDyPAEqxv
         QII1DATGX1Wy+mo6jVI29AGxCTbwjkIB1k55lYtqNZAPnMc3eWDziqm9gFd94Q1GTT0f
         Xq0u94iK0zJ5OGGINL/B5IlfDAjSA3Tq2SFqte0SS3Uj+/Si8aqfL2M5UJeGXamd+sZR
         IPqQJXDIUfYEgUqqHDDkMg/rhfKE6xintvfFuLx4x9RN7BDqd7+bPMh5yxdMfBoDvI4B
         YhOL5kJ3BjvnB3mvochyNPetZNGYa9YW35gi1StHwrxdQgwZO/WMVKSztiPWUWGVrwXd
         Lu8A==
X-Forwarded-Encrypted: i=1;
 AJvYcCXKZTg1nHk7Y0jGwpQ14Fpd2spVCJKgoR4X0AVRSxPpZVJVSMwnyvlPIlptBC1dl/gJM0xTIvqEuXAuIJw=@vger.kernel.org
X-Gm-Message-State: AOJu0Yx5HV+0R485KokkMmFeCcV1gYKaFIEhO7BxffKE3sDCrTXnm5BG
	NuR/gdbebnw6K6AzuMxD0uMYoFztDDTTI1vwj7InuD7X0SX18CoLYzh3ZLWxfPH0Ag==
X-Gm-Gg: ASbGncsmvO5NkJ/bD504mvSmNi5IfP3ZwwZU9t6jWKRJFm2qXBQGc8SiDNzaXeSYmeg
	WiN5nGK2tdM0nTxF1882u8F5ME9k/8gd9BXJ+aHIbrRWMuulrXmeJ2yaXxo2HgPWET+c+JJfQDr
	Yu5WO+bmPAzRWiajUuJY0Zawr0U4f+kEkIVZ7HmhhuuNl2LmYawdQDsY8TRYqvBkW8yyknA1SIS
	Si7/jP2HInbX0vxDEQ0tSEiq574hGoUbBjp22E0p0LYMbg=
X-Google-Smtp-Source: 
 AGHT+IF75O8Kf8yVjIrKj6IThBxkNnWISNGuSICcidlf1g84tGFy1GH2je7f5vgj+qBlBS9i76BkpQ==
X-Received: by 2002:a05:6000:704:b0:386:1c13:30d5 with SMTP id
 ffacd0b85a97d-38a221e2efcmr53226450f8f.7.1736258443482;
        Tue, 07 Jan 2025 06:00:43 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:70::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43661289995sm596241935e9.36.2025.01.07.06.00.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 07 Jan 2025 06:00:42 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	kernel-team@meta.com
Subject: [PATCH bpf-next v1 22/22] selftests/bpf: Add tests for rqspinlock
Date: Tue,  7 Jan 2025 06:00:04 -0800
Message-ID: <20250107140004.2732830-23-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250107140004.2732830-1-memxor@gmail.com>
References: <20250107140004.2732830-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=16838; h=from:subject;
 bh=FBR0eFp2XW2HZA+qyWkxTYz5iZeScNdFDaNVHD4Qz3A=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnfTCfBgLimPqxMSYRHgRb0agCRulR6yMrQBMlBN46
 TUCIZuKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ30wnwAKCRBM4MiGSL8RylwpD/
 0ZZdsMszNOBioSf554AHzfzcMUnTmTu25N/acvHgjWdSbHTS1jdsXcuPSIfV55rVRfhh3oZx5/aITS
 Ak15rCP8/ZjIV8cEWTKBIdnsmnXLnoUSYKqH6sgbWAxpmyvgilpIEqSRrey5KGGVBVY7iuzEZcYHlV
 jqYJ/bqr2Y1YiJQh0fHa+AfdaRUxcE6EOVpvwqaYfm7exFVJ5RaNmEXc7ZJ3cBlDY8cf8xPSTVEqe4
 iVTWKMsiOF8QFBjdmsdesvfTnPAJ9Sf18vwzCZQ0PfDj20hbTMcm8lY+oEF8Ja2MV45PTY1o43XgvS
 oM9QxwJl2m1nyPaUFmNIeqC6Z5RiD0TjfliSBqtW02Pq3eu7jz4xQR81XZhi1z/XVsNybm/1mokQAE
 07G+21/Ey2HJtkfnNaYKhOEjBhn4GrzIaGgwz36nC5oPGAPZHQTE/fEHdmoFjrUDSLmpaWiAOGA1Ij
 bjgIQKRtg8qx9li/pmDuzqkq4AtEKo0Wa9+BftmlPQt7OBfGeUVAnSk8UGpqBZZtlZg45uxjN5Qt4Y
 C/8Eem2MMCIOnhiMS+9PwKzjXFpJlwJDcOaMqtxy8bHmkVFSxyYHcbF4ARwHlCbOma/NdeVEsd9wU7
 BXxvSt3CLPoR4jMZGPQY8hTWRyMXvb3oeeNEhLf0WGGvE7aZbjIPU89cC7lg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  | 103 ++++++++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++++
 .../selftests/bpf/progs/res_spin_lock.c       | 189 +++++++++++++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 226 ++++++++++++++++++
 4 files changed, 571 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
new file mode 100644
index 000000000000..547f76381d3a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "res_spin_lock.skel.h"
+#include "res_spin_lock_fail.skel.h"
+
+static void test_res_spin_lock_failure(void)
+{
+	RUN_TESTS(res_spin_lock_fail);
+}
+
+static volatile int skip;
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	while (!skip) {
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "test_run");
+		ASSERT_OK(topts.retval, "test_run retval");
+	}
+	pthread_exit(arg);
+}
+
+static void test_res_spin_lock_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct res_spin_lock *skel;
+	pthread_t thread_id[16];
+	int prog_fd, i, err;
+	void *ret;
+
+	skel = res_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load"))
+		return;
+	/* AA deadlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+	/* AA deadlock missed detection due to OoO unlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_ooo_missed_AA);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	/* Multi-threaded ABBA deadlock. */
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB);
+	for (i = 0; i < 16; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	topts.repeat = 1000;
+	int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA);
+	while (!topts.retval && !err && !skel->bss->err) {
+		err = bpf_prog_test_run_opts(fd, &topts);
+	}
+	ASSERT_EQ(skel->bss->err, -EDEADLK, "timeout err");
+	ASSERT_OK(err, "err");
+	ASSERT_EQ(topts.retval, -EDEADLK, "timeout");
+
+	skip = true;
+
+	for (i = 0; i < 16; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+end:
+	res_spin_lock__destroy(skel);
+	return;
+}
+
+void test_res_spin_lock(void)
+{
+	if (test__start_subtest("res_spin_lock_success"))
+		test_res_spin_lock_success();
+	if (test__start_subtest("res_spin_lock_failure"))
+		test_res_spin_lock_failure();
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index b0b53d980964..3d4fee83a5be 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
 extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
 extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
 
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
 SEC("?tc")
 __failure __msg("arg#0 doesn't point to an irq flag on stack")
 int irq_save_bad_arg(struct __sk_buff *ctx)
@@ -441,4 +444,54 @@ int irq_ooo_refs_array(struct __sk_buff *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_ooo_lock_cond_inv(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) {
+		bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+		return 0;
+	}
+
+	bpf_res_spin_unlock_irqrestore(&lockB, &flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_1(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	/* For now, bpf_local_irq_restore is not allowed in critical section,
+	 * but this test ensures error will be caught with kfunc_class when it's
+	 * opened up. Tested by temporarily permitting this kfunc in critical
+	 * section.
+	 */
+	bpf_local_irq_restore(&flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	bpf_local_irq_save(&flags1);
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags2))
+		return 0;
+	bpf_local_irq_restore(&flags2);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c
new file mode 100644
index 000000000000..6d98e8f99e04
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define EDEADLK 35
+#define ETIMEDOUT 110
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
+SEC("tc")
+int res_spin_lock_test(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return -1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem2)
+		return -1;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (!bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem2->lock);
+		bpf_res_spin_unlock(&elem1->lock);
+		return -1;
+	}
+	bpf_res_spin_unlock(&elem1->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_ooo_missed_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2, *elem3;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return 1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){1});
+	if (!elem2)
+		return 2;
+	elem3 = bpf_map_lookup_elem(&arrmap, &(int){1});
+	if (!elem3)
+		return 3;
+	if (elem3 != elem2)
+		return 4;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem1->lock);
+		return 5;
+	}
+	/* Held locks shows elem1 but should be elem2 */
+	bpf_res_spin_unlock(&elem1->lock);
+	/* Distinct lookup gives a fresh id for elem3,
+	 * but it's the same address as elem2...
+	 */
+	r = bpf_res_spin_lock(&elem3->lock);
+	if (!r) {
+		/* Something is broken, how?? */
+		bpf_res_spin_unlock(&elem3->lock);
+		bpf_res_spin_unlock(&elem2->lock);
+		return 6;
+	}
+	/* We should get -ETIMEDOUT, as AA detection will fail to catch this. */
+	if (r != -ETIMEDOUT) {
+		bpf_res_spin_unlock(&elem2->lock);
+		return 7;
+	}
+	bpf_res_spin_unlock(&elem2->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_AB(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockA);
+	if (r)
+		return !r;
+	/* Only unlock if we took the lock. */
+	if (!bpf_res_spin_lock(&lockB))
+		bpf_res_spin_unlock(&lockB);
+	bpf_res_spin_unlock(&lockA);
+	return 0;
+}
+
+int err;
+
+SEC("tc")
+int res_spin_lock_test_BA(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockB);
+	if (r)
+		return !r;
+	if (!bpf_res_spin_lock(&lockA))
+		bpf_res_spin_unlock(&lockA);
+	else
+		err = -EDEADLK;
+	bpf_res_spin_unlock(&lockB);
+	return -EDEADLK;
+}
+
+SEC("tc")
+int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx)
+{
+	struct bpf_res_spin_lock *locks[48] = {};
+	struct arr_elem *e;
+	u64 time_beg, time;
+	int ret = 0, i;
+
+	_Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 32,
+		       "RES_NR_HELD assumed to be 32");
+
+	for (i = 0; i < 34; i++) {
+		int key = i;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	for (; i < 48; i++) {
+		int key = i - 2;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	time_beg = bpf_ktime_get_ns();
+	for (i = 0; i < 34; i++) {
+		if (bpf_res_spin_lock(locks[i]))
+			goto end;
+	}
+
+	/* Trigger AA, after exhausting entries in the held lock table. This
+	 * time, only the timeout can save us, as AA detection won't succeed.
+	 */
+	if (!bpf_res_spin_lock(locks[34])) {
+		bpf_res_spin_unlock(locks[34]);
+		ret = 1;
+		goto end;
+	}
+
+end:
+	for (i = i - 1; i >= 0; i--)
+		bpf_res_spin_unlock(locks[i]);
+	time = bpf_ktime_get_ns() - time_beg;
+	/* Time spent should be easily above our limit (1/2 s), since AA
+	 * detection won't be expedited due to lack of held lock entry.
+	 */
+	return ret ?: (time > 1000000000 / 2 ? 0 : 1);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
new file mode 100644
index 000000000000..dc402497a99e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+long value;
+
+struct bpf_spin_lock lock __hidden SEC(".data.A");
+struct bpf_res_spin_lock res_lock __hidden SEC(".data.B");
+
+SEC("?tc")
+__failure __msg("point to map value or allocated object")
+int res_spin_lock_arg(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff));
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_cond_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_local_irq_save(&f1);
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock)) {
+		bpf_res_spin_unlock(&res_lock);
+		return 0;
+	}
+	bpf_res_spin_unlock(&elem->lock);
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo_irq(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1, f2;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) {
+		bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+		/* We won't have a unreleased IRQ flag error here. */
+		return 0;
+	}
+	bpf_res_spin_unlock_irqrestore(&elem->lock, &f2);
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0")
+int res_spin_lock_bad_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((void *)&elem->lock + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset")
+int res_spin_lock_var_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	u64 val = value;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem) {
+		// FIXME: Only inline assembly use in assert macro doesn't emit
+		//	  BTF definition.
+		bpf_throw(0);
+		return 0;
+	}
+	bpf_assert_range(val, 0, 40);
+	bpf_res_spin_lock((void *)&value + val);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_map(struct __sk_buff *ctx)
+{
+	bpf_res_spin_lock((void *)&value + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("local 'kptr' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx)
+{
+	struct { int i; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	bpf_res_spin_lock((void *)p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";