From patchwork Sun Mar 16 04:05:17 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018273
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AB3B2940B;
	Sun, 16 Mar 2025 04:05:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097948; cv=none;
 b=JdSPycj8GG31Mgp7dOMAE0vt0ftEf+cc1bMSp+F3mwOqPd50AWed3HQbLz0f0lPDZcrTYIXtvOyexSzc2Q+rz6tUHf+etO4JCo2GIKhVyzdHe6R2KgHHjnGz24nXCBhjiMEoq1EsqwTJJk9N4YQkMJuJYzjMSH1BaowC2n85ABw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097948; c=relaxed/simple;
	bh=eRhFkF/3rcweOD1hEb8YOfkkc+MgHB8pVKtpJtztTto=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=i0kUkpAjvfjeLlFSPkowJ3AnuMEWVlN7RUijhgQewgmXwLRS+KX7+OWrD/fRxvACNwixy5zB6Qg6Nc/0PrgUMnkaEslRonX/Xdq6hikxDiJCcOiWM83l8pH1eETtkdOqGF4RhiekDcbtof8EFuooGottFln69oCoPOfpGzp/bAg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ZgCFU0ja; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ZgCFU0ja"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-4394a823036so9985065e9.0;
        Sat, 15 Mar 2025 21:05:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097944; x=1742702744;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iAZ9jQC/mJhCSOh/eXQcdu5v8UyTgJRpZIptn/Tfisg=;
        b=ZgCFU0jaFN6KoFwiAEE4/dM3+uc63WNZApudc86j5h3FbFeTsRs6qHSXI7ZC9KaSRB
         quLr41ZTylvLhuoX9+1DipSsZEj7flJ0XfUTRco8lvZ3HYYvRjNzjZfIwToKQ9Wqx4Vc
         ENdOYkl3X6DqcpTFC8UWe1i4istZE5dfCv2sMRnJq86lL24hogxg5jzqz/nkG6u69PVa
         TOVFDkiHI2JFgSQ55o6Oakm58zKYeel7/8GvYl49Vs8DTu2bxwQO8FVUzfE9ubvqOWig
         dLupiyAZSn+N6Maz0oxzdhxO6JeBb198nOYGNPsnTuQtcdmMLgpE2z4er3MfBYL0nxdg
         /kIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097944; x=1742702744;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=iAZ9jQC/mJhCSOh/eXQcdu5v8UyTgJRpZIptn/Tfisg=;
        b=SsC2/ArMl+rwP4dIPrmUbhknmHfRt1b0AEvGmxpUDvH6jNDu/5QsD3d343bdwCad+T
         Xq7gTIVn7EwS5zOgjqK0PykjPuGNANvQLv0335VOiEFXbmAfolqkQY8YeMzTOvsGVboH
         RgBKFiGohvH0lFyOdhr5e9gFgCygXiIhAJaM8WPwmxslaxJkJZ1y2XlLjlVDU95cHBJR
         5//K42E6bfiTcDtnNBWWgU1R2VOpLBPtNuLym5NYWFZBNFZgSELN0KH5OdcN/jY8Sby+
         qU2iLvmLnaTjKr/qiKtKgvyeE8xX17P4We0cGqoYWDi8wGLO362cn996jJ/tvyzpAxdf
         R9qQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXuRPEkUmAGf0ZVuiwwINMjP00rf08INTbzWF1CP7Qzw8C8vr8g4gFdPmOmysJaDIUbQ5QtpVlM85MpRhk=@vger.kernel.org
X-Gm-Message-State: AOJu0YyyT2IsZobxQNV62Uvwd/O22RICEeFXL0aWm4HD+D4cuDWQazkX
	wcMQMOD7LbRx90axzZFpFFVAC2Gtn32MJ5w4VMvOHl0DD99GmTGFeqCCseuobaE=
X-Gm-Gg: ASbGncvZa41QwKOXLblWEsVlQMw2zXS4VWdZA4FQUPIPa0YcQPZs0OskrhOiD4md7BW
	kEirCHYTW3vExqzZbmqCRof85ZuwjjYp6EMhK8vKR+f5NndZw/CuAPPWhez5wEg5ZNmveWkBTfL
	OkjDd1qNxZsdQ4tm5JwWQmIH5rPMKlja6Jh3ZUj9vogvEwZyhaRCWgJSWsjk3x1P84T+Ddcz8Fw
	mlou8GltQDtfBIHP0XKi0Hrj+PIWJpbK1eCaRubqjDPi2pb3OGKoNy6F3C4ocpwiAUXiZl/FA5u
	3eU8gLeK1zXPu0zob4Y25bUQnxywMm8uMw==
X-Google-Smtp-Source: 
 AGHT+IGf5Cwi04Qa+X8PjeKePQUKLiwDo+IJ3EDto0rr/2TEINmkWVGh9IecdlUhicgQ4p2TyfcEGQ==
X-Received: by 2002:a05:600c:4f41:b0:43c:e7a7:aea0 with SMTP id
 5b1f17b1804b1-43d1ecd94b6mr82913775e9.26.1742097943833;
        Sat, 15 Mar 2025 21:05:43 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:5::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-3978ef9a23bsm6539658f8f.15.2025.03.15.21.05.43
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:43 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 01/25] locking: Move MCS struct definition to
 public header
Date: Sat, 15 Mar 2025 21:05:17 -0700
Message-ID: <20250316040541.108729-2-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject;
 bh=eRhFkF/3rcweOD1hEb8YOfkkc+MgHB8pVKtpJtztTto=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3butWk8EuLzPVXxsuVw8yqD8MdVwsBmree1XPD
 V9Au8WSJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyiyIEA
 CF0DaeI20sABfTiW+XnI2K40s/yy2XdKyaP2M+UF+REcO/yDFmgt6PB92KqELak5+J8Zi2i1IGAPbL
 VUQGB/9GPicCJbHPwoFytzQGK9WLhUge3emsZIXUEY6hsTw5s5YX3aKvWQSxl2n+2FRSIopyc1rC3+
 nX4Pt4N8iQzyrESusYoivUFphBfzorcIhNELZlC8XfeNgSvpH7VhGHxzgDF+f0x7ptI8tLqB4S9WnC
 rGlawzSJSmYe+054/yEUR03h250A1H8XHtH8s0UkaymWWfZwSVmdA2vRjk0YvRn/kn0jx3eBtT/Tc1
 MnmcwgF3gfyF2j7LQ9KQ79/bXniDKzmIuPc4cUE7V8Mj/AWjZPH/AEFoQy7K47YG2vKIGBw+LK0DZD
 8CMVBbbCsDt32Q699Fn0tSZeYkwgoDuNKv3MIFEk8nz7Ih7KdMQgjbUPJyk1KW8RabsVw0BOlA3igv
 /cfexOGCQVv0Tr8SQeEhc6OGyDr1NKnYqPqJ3+aBlq4sy68DpHYpQSYKJcESF4nGYrf9eOB6GPGPGK
 BEZnXXAechWthJbFWsNbSQMW/iZ7GgQ5MtQ4/T19J94XQ3Bc/LoSVUvlTBboqEyIUDd+eYO1ECll0r
 w9vUiTso2v9M8XgX8vGZeEBkK42nAuhPuDiT1R4EOcasWTSuQANf3LFmwfGQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/mcs_spinlock.h | 6 ++++++
 kernel/locking/mcs_spinlock.h      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..39c94012b88a 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -1,6 +1,12 @@
 #ifndef __ASM_MCS_SPINLOCK_H
 #define __ASM_MCS_SPINLOCK_H
 
+struct mcs_spinlock {
+	struct mcs_spinlock *next;
+	int locked; /* 1 if lock acquired */
+	int count;  /* nesting count, see qspinlock.c */
+};
+
 /*
  * Architectures can define their own:
  *
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..16160ca8907f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -15,12 +15,6 @@
 
 #include <asm/mcs_spinlock.h>
 
-struct mcs_spinlock {
-	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
-};
-
 #ifndef arch_mcs_spin_lock_contended
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics

From patchwork Sun Mar 16 04:05:18 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018275
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8EFB78F59;
	Sun, 16 Mar 2025 04:05:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097950; cv=none;
 b=fAMEUbi18tkaDxQYnaiCMhZswTC5ih361s5PNYF4yDZCjcvOlrCcpYn8vLxniAxlg8nM7mD2MSGZzt+zau13yjr8GR3yCEYCP4uHpe4SQqaSmyJHvvjhMKp/fAR/oIG19kpRMk2skcW/1L0c6Ys7F1gLCAXntd1/6iYzP5Wvt/A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097950; c=relaxed/simple;
	bh=P6WGJXH3MTYAaWinW8eboOniR3lR+VqqyTTeTx9qsto=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=JGCjwD72LORamIN2l/6tOEyl1KgyK9TU4yMmiNSD5VsXXNy5lKxzVsEikW9eveukQOr+0I4GgPZVsfnYvssdm+BxdI9IklPeV615ooUyv9ETFKB6WboRJfZSW7A78jZrkt6fsDJMiQjD6kU65e82WaCGHjGjSq8bE+eNJwUjNrM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Puvy+a4x; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Puvy+a4x"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43d0618746bso7043075e9.2;
        Sat, 15 Mar 2025 21:05:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097946; x=1742702746;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=1ZDPUUfPIPI1I/a7znm6GXK798ScLYPYP7jBEkUfqo8=;
        b=Puvy+a4xQGqWSPyEV8f4lTBn0a0VNlW82JRkquVIMhEyZs3Mx40eFSGFhRkp/993Eu
         H553ETIx6NyUhyF7msov91teQq2RPUo7kiCekyHAseU8+fGiccQT83KuAU72eBeWqQJk
         njmjP9tvEU4ez6ATkHbZqwiWlafyb/BWFkxCtZe6Dn1JKB7t9HFckiq7WJrmf/EiEKNG
         ck6MH3H1jCga6A6B14qZRjKB2VYZpsGzhxv1L9TUYGm/K2j5HKNVepjNjEzGN5dSGenh
         aqwio8epSh/LaM6hpRDjVpu0P04oA7DtUbJB2sgqjiFe3hnPJlsyxidArgG9/yg5lVio
         qjDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097946; x=1742702746;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=1ZDPUUfPIPI1I/a7znm6GXK798ScLYPYP7jBEkUfqo8=;
        b=srwCWJ6wGUz2q+OVOAS9RTE9MOjHQTz/5vbooLdi5iV7lZ+dn9gQY4chP6Vlq/t/dN
         GIrvTfmRFp7L+E3datQjM13fU4MM9zSi+JYR2ywwnLBd7zqqT8CAx9p+HGckvxS5asyo
         Fm9VEhiUQPLOfODOZdVVoN9Wrfoq7SvrzgWOEFK5FXFgfUoReo5HPFsIw2SkfCJWlD/9
         cqx5+PPio/kigvpSaE4G+Bk3hUuwGEI8S9k11MK0U9pZ1CWkx3G5XGk1cMK1QO9XYTJ5
         VaEX4kN7D9yTFVEDgQS/vYf/DchAOceByD+YivCW6YtNWhcku8uvUj0xmV2jr40KPFzp
         MhGA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWMFyL+pxJj64GeXPbLuphlCnNeMjAKHY97axKNDhunW+hkp9E/X4i2dpWRtyFBISRcYaWC11r342dhJUo=@vger.kernel.org
X-Gm-Message-State: AOJu0YwodjvUhtQpHbD41UvaZ7q+aUQqKM+pcckSq7b1lzebDG5FOPO4
	EKdWWYsy6Acl23aSY+F9X78GmxgLmDCT0ZzT2rG2pd8EkBbICm2nYq4OlqswN9U=
X-Gm-Gg: ASbGncvYwGHCzP2JXJyO/gN+JEPMfU/e9ysW+WkkPRCoyzjNbv2upXFMeG+Y0RcyxrJ
	fc3SJVnQXGzFXyq0bsKVYEKHlRD8k/4y0Vt1gPwhWCbCnmYof/5lZa0hv/w+LfV5B1zZ9aAaNQV
	B/VRKi2sm0hCG88QjkdpKxvFeaJCcTz1fmnADsglT+uJy4I1DVYwqQfmLo3BpVmFjWOkEmeYs5l
	pG+/FQptPkzIlvDD55iksT0Ao+J5KTKmavxpuoYn57h9D5drt85WLg8ELgnuHkl8ULq2bBNgJ0D
	CEnFjfKNWEFDI8Dmj4PA4qbtZrAVCHT4RQ==
X-Google-Smtp-Source: 
 AGHT+IGQENkXPWklAfMwa5tz+DJLXBtrNTvE3SYBajxXKIcQybZMggtMRpVOuyPRCt7uZr8imHSk7A==
X-Received: by 2002:adf:b511:0:b0:391:2bab:d2fd with SMTP id
 ffacd0b85a97d-3972086e264mr8092081f8f.37.1742097945494;
        Sat, 15 Mar 2025 21:05:45 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:3::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1ffc4173sm67180505e9.20.2025.03.15.21.05.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:44 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 02/25] locking: Move common qspinlock helpers to a
 private header
Date: Sat, 15 Mar 2025 21:05:18 -0700
Message-ID: <20250316040541.108729-3-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject;
 bh=P6WGJXH3MTYAaWinW8eboOniR3lR+VqqyTTeTx9qsto=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3bMavRJM1KjtMppAwdQp0y3W5tnfeN56xhPQqP
 M1qWb9WJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyreTD/
 4hVzlupZxK+mXnZN8k0vM3l0bfdhXAjegYO6MB4u/i91194228LN6CqHYReD3NrKgIY7Ue/lOtB5I8
 pe6xiAeiXsjhwaxVw+eERShrPtXscrLjqVkPbhYaUpztghP5FcpkEiHrtf0Ovl1mV3AioL8+6Yud4a
 855N6w8ocvsmrAjpnk7ZG3LlTm2/sqyQ/oSATI/ru4AY+Npf46eYKJPRQgZmUCHnTHycySFMZ/0O9y
 4sZpNEcRRxc5gekVdr1W5QxoZ8exvppkXShi14L8nOQtu07c7MIvXstpHDOjpgBiUp2S2lEaoSxwuJ
 e3PMjaLA59vDdLR+cjJ0ONxQDARw/hXHz0hJZ0Y6KtzdJ6YHyQrMQyboYuuk68RN9xyKHCziUOmdLw
 3QB7tcInWuPYX0gO9t5CjDcd5uYjUE+jBNb5ZVL3I9eLcaDXPaydxo37NnezFHGPOAKrgVt2OpwekQ
 3Ydi+cTB2XWKxI8kABZ6BZPbW2LrCGMQbmad22jcHBlWed26N8fAYEIehmtbwSCE1AlSuX+jTR0jLw
 wIWBpXB9Tsrio20/PuUDpgbem6EDL2zHfZhFYrLtNTtbLpI8ttEz/hBqaeOveace8E/95yjGlAzosm
 me8B3MJ2NYiphGOt8QwTmzvU3+qxGbwXERjjtIv0R2DUcIe8gjvYD7V85LEg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/qspinlock.c | 193 +----------------------------------
 kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+), 188 deletions(-)
 create mode 100644 kernel/locking/qspinlock.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7d96bed718e4..af8d122bb649 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -25,8 +25,9 @@
 #include <trace/events/lock.h>
 
 /*
- * Include queued spinlock statistics code
+ * Include queued spinlock definitions and statistics code
  */
+#include "qspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -67,36 +68,6 @@
  */
 
 #include "mcs_spinlock.h"
-#define MAX_NODES	4
-
-/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
- * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
- * that, we insert two more long words to pad it up to 32 bytes. IOW, only
- * two of them can fit in a cacheline in this case. That is OK as it is rare
- * to have more than 2 levels of slowpath nesting in actual use. We don't
- * want to penalize pvqspinlocks to optimize for a rare case in native
- * qspinlocks.
- */
-struct qnode {
-	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	long reserved[2];
-#endif
-};
-
-/*
- * The pending bit spinning loop count.
- * This heuristic is used to limit the number of lockword accesses
- * made by atomic_cond_read_relaxed when waiting for the lock to
- * transition out of the "== _Q_PENDING_VAL" state. We don't spin
- * indefinitely because there's no guarantee that we'll make forward
- * progress.
- */
-#ifndef _Q_PENDING_LOOPS
-#define _Q_PENDING_LOOPS	1
-#endif
 
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
@@ -106,161 +77,7 @@ struct qnode {
  *
  * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
-
-/*
- * We must be able to distinguish between no-tail and the tail at 0:0,
- * therefore increment the cpu number by one.
- */
-
-static inline __pure u32 encode_tail(int cpu, int idx)
-{
-	u32 tail;
-
-	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
-	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
-
-	return tail;
-}
-
-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
-{
-	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
-	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
-
-	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
-}
-
-static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
-{
-	return &((struct qnode *)base + idx)->mcs;
-}
-
-#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
-
-#if _Q_PENDING_BITS == 8
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->pending, 0);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- *
- * Lock stealing is not allowed if this function is used.
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
-}
-
-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	/*
-	 * We can use relaxed semantics since the caller ensures that the
-	 * MCS node is properly initialized before updating the tail.
-	 */
-	return (u32)xchg_relaxed(&lock->tail,
-				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
-#else /* _Q_PENDING_BITS == 8 */
-
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	atomic_andnot(_Q_PENDING_VAL, &lock->val);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
-}
-
-/**
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail)
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	u32 old, new;
-
-	old = atomic_read(&lock->val);
-	do {
-		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
-		/*
-		 * We can use relaxed semantics since the caller ensures that
-		 * the MCS node is properly initialized before updating the
-		 * tail.
-		 */
-	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
-
-	return old;
-}
-#endif /* _Q_PENDING_BITS == 8 */
-
-/**
- * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
- * @lock : Pointer to queued spinlock structure
- * Return: The previous lock value
- *
- * *,*,* -> *,1,*
- */
-#ifndef queued_fetch_set_pending_acquire
-static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
-{
-	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
-}
-#endif
-
-/**
- * set_locked - Set the lock bit and own the lock
- * @lock: Pointer to queued spinlock structure
- *
- * *,*,0 -> *,0,1
- */
-static __always_inline void set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
-}
-
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
 /*
  * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
@@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * any MCS node. This is not the most elegant solution, but is
 	 * simple enough.
 	 */
-	if (unlikely(idx >= MAX_NODES)) {
+	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
 		while (!queued_spin_trylock(lock))
 			cpu_relax();
@@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
-		prev = decode_tail(old);
+		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
new file mode 100644
index 000000000000..d4ceb9490365
--- /dev/null
+++ b/kernel/locking/qspinlock.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Queued spinlock defines
+ *
+ * This file contains macro definitions and functions shared between different
+ * qspinlock slow path implementations.
+ */
+#ifndef __LINUX_QSPINLOCK_H
+#define __LINUX_QSPINLOCK_H
+
+#include <asm-generic/percpu.h>
+#include <linux/percpu-defs.h>
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/mcs_spinlock.h>
+
+#define _Q_MAX_NODES	4
+
+/*
+ * The pending bit spinning loop count.
+ * This heuristic is used to limit the number of lockword accesses
+ * made by atomic_cond_read_relaxed when waiting for the lock to
+ * transition out of the "== _Q_PENDING_VAL" state. We don't spin
+ * indefinitely because there's no guarantee that we'll make forward
+ * progress.
+ */
+#ifndef _Q_PENDING_LOOPS
+#define _Q_PENDING_LOOPS	1
+#endif
+
+/*
+ * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * size and four of them will fit nicely in one 64-byte cacheline. For
+ * pvqspinlock, however, we need more space for extra data. To accommodate
+ * that, we insert two more long words to pad it up to 32 bytes. IOW, only
+ * two of them can fit in a cacheline in this case. That is OK as it is rare
+ * to have more than 2 levels of slowpath nesting in actual use. We don't
+ * want to penalize pvqspinlocks to optimize for a rare case in native
+ * qspinlocks.
+ */
+struct qnode {
+	struct mcs_spinlock mcs;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+	long reserved[2];
+#endif
+};
+
+/*
+ * We must be able to distinguish between no-tail and the tail at 0:0,
+ * therefore increment the cpu number by one.
+ */
+
+static inline __pure u32 encode_tail(int cpu, int idx)
+{
+	u32 tail;
+
+	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
+	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
+
+	return tail;
+}
+
+static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes)
+{
+	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
+	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
+
+	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+}
+
+static inline __pure
+struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+{
+	return &((struct qnode *)base + idx)->mcs;
+}
+
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->pending, 0);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	/*
+	 * We can use relaxed semantics since the caller ensures that the
+	 * MCS node is properly initialized before updating the tail.
+	 */
+	return (u32)xchg_relaxed(&lock->tail,
+				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
+		/*
+		 * We can use relaxed semantics since the caller ensures that
+		 * the MCS node is properly initialized before updating the
+		 * tail.
+		 */
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return old;
+}
+#endif /* _Q_PENDING_BITS == 8 */
+
+/**
+ * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
+ * @lock : Pointer to queued spinlock structure
+ * Return: The previous lock value
+ *
+ * *,*,* -> *,1,*
+ */
+#ifndef queued_fetch_set_pending_acquire
+static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+#endif
+
+/**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+}
+
+#endif /* __LINUX_QSPINLOCK_H */

From patchwork Sun Mar 16 04:05:19 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018276
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F47F1494C9;
	Sun, 16 Mar 2025 04:05:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097950; cv=none;
 b=TzgzeXLY2wQ3pGdgthDqw8Qu1lrPtXFxrBSpXHNi8OxkHpET176W47vwKrqtWW4Mq1cDafK+q3NWPDptJB/Vcthf742W9guaZWvxVHPpOLxolPe2JpUWTof64mg7rqKyp5lzYs6mdbY69PLyN9pwhB44xaVcilmRM6NSHY3hkYM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097950; c=relaxed/simple;
	bh=bCYnbE+K73lTBoOLttYSRrRShne2EKo6ypzNdER5dO0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=lqJkwoHONhD4KGeZandz7CPqKK2urVG1GSLzh4PTY4hT6l6PAufQMZExRwhQv8ffRCI/tmmy3/xsdIyYtaG737HgFf8qp/4foqjeuMcrln9b8w6naMkQt7Tpx21GZ7kfEjo3QmTeZx9V1EOW0gMDMphc1+AM61vevGjlGPVExTo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=l06BoCnE; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="l06BoCnE"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-3912c09be7dso2191583f8f.1;
        Sat, 15 Mar 2025 21:05:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097947; x=1742702747;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=arAQKO8UQM0dQfXie+8D2prlulJwIsVlpZ53Yt0ULrw=;
        b=l06BoCnE8cG5Y0t/nY1rFVHuMKAOiIz0VMdEjPnELfEvdA9ANbeFe64t2Ho/XGNL4z
         2+aSUp+dMHRTyPf2Ur1Yd8zgMobGY1nbmNEIkSMcgVRfif78YlDnYO7EcxBF0zijDobg
         7NTXmw/ed64z7/5JlV0R+pTuhqettZqRuhqeym7ekdz1v715Ki3wZt0BuFlbDSL7xvPO
         EpU27qdc1xf9c6HRLZAfJCFlkfM/7oQRMLnhDHFWHCoNh34ryPkJ2B0z7i9LUTFTHWso
         eped3kCMePMFuaFHRB1JWq5L1rZ7KzrlIwOcMr7ZjENbgKb/6XdYSrKXRHCxBgequw6K
         xXNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097947; x=1742702747;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=arAQKO8UQM0dQfXie+8D2prlulJwIsVlpZ53Yt0ULrw=;
        b=ULcbBM/liXbdwME7gl4PJ6Xo3UV2/ujdKWLTTbJucpyOsqaGvQcbjqHufA6+tweXBX
         2Az0/fxFaddeSakQkk0TxK27tqXE6Nm3EBSNXNE8M99bvIZazhrrohRP/6xP9rEMC7Rh
         yV4ic+vzFASb3F8uA4gKbQSUUyxv6GavimVGUNyQMdJcn5DKkUnAZVN8On3TefEUcinC
         sEg2+75YFdHPJFAOhD0jS28xE+pIQ8cNChK69Pp5zgbY75pCMJ01kwduhtdAYITqB1N6
         J5Qru125SkVJPAhbfEB94j07oep3JY7jFInNFwIGd8k/m+H4vp7rJcprd+9iUc7TLzsM
         Uo3g==
X-Forwarded-Encrypted: i=1;
 AJvYcCXXf1O4KPGoRMKlVO6bKwYSn9Fz92eFziTm/xC3znQ1wrMIQd1er3fmGVafZFCgMPs1OjeYivKUNnrtvJc=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy7b1E8ePUXBU9homgZ36eXQeFppKd5+1i+h5pVxXDwBd5VuWIw
	O8fgYUNqO51Gj42+yK/qsXbF8thS3dUU+TUc044aplvEZkx9w8Snk02ZROravcg=
X-Gm-Gg: ASbGncvGXPZoRhqyAu4T13c9TwI/drM7qy2bPkkBlxYH/4Xp+im/7rR5j3efrTzEoaR
	etJCv5MAyoBkADAIAtkBZns3exUWuaUeZ/1IbMkXprp2l/RJEVyHJpI/wExhWI3j0nz5cgqGdEx
	Y4AocfiRCD04WYMK1I3MMb+C26TgvM9f+Mvn3C88KH/ULk4iX/DzcizcW5HbvQ0OyRiDxgBhUmb
	xZwHzPCT2STziKAAU7KeG/3fVHBIu9udxZPVJ6JQS+/YyxI/0bdI2xWkMC8k7fKX51bcUY08NP7
	BOSP1RM9PxihDbp3IKK/FSZwTgzgsrhvWsI=
X-Google-Smtp-Source: 
 AGHT+IGeJzY3eBGexLcsk1yze9S3oNLCw88EaC+mhaa9vg5SAERJj6XSK27DdfLilXTABNLSoX68Ig==
X-Received: by 2002:a5d:6487:0:b0:391:4999:778b with SMTP id
 ffacd0b85a97d-3971ded24eamr8650576f8f.28.1742097947068;
        Sat, 15 Mar 2025 21:05:47 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:48::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb40cdafsm10936707f8f.62.2025.03.15.21.05.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:46 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 03/25] locking: Allow obtaining result of
 arch_mcs_spin_lock_contended
Date: Sat, 15 Mar 2025 21:05:19 -0700
Message-ID: <20250316040541.108729-4-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject;
 bh=bCYnbE+K73lTBoOLttYSRrRShne2EKo6ypzNdER5dO0=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3bZVJO+UtaXh2ksd2ygsxMkhcgERBwBjs9FllI
 XcQQD02JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyrE+D/
 9mak4NXwqkExtA6v4P4pv81udcNOxwDMXUb35o6/CJjxArzw0HvdY428jcentyE0WrkYShVveSLa8v
 gbZpOY+R4DxGLAG9A7vhsHWyBPTG4Nm/fz4K6mql1dG8Rn86G+pi7KNm9DUbA8dhS79edorwxQ74w2
 ZgfwIKcHyHKk//0Zzo+3TqqGOY647CW2eelFr/fOUbVF3aHHtj3H9SF2Cn5TYkGwX37s0nQT08NeRC
 6lC3cRBakQV16GrBRvnzuxwKCr0riv4WOvH+At23JAXrPoKxRRCJyHthJUB5AjLXXVpWjd3vQRZ84e
 m4j04JXHSKMor4vL6CJNdPbQ45sjOyPq0IX85pP4AbEnoRaoWA17Z6Sc8YfUIE7vO6miyigFrZ/abj
 THdDEqULYZgI9XRQlWB8yDm/grvuD87SrkKzJWdCqZOqs254FuPsbYcY7KBd395/gAGmRcGdmPdBEn
 vM+vkNfbds7UMoaPYMZ25ck7vCMuPGGuBlKCj7EkeTPCiIB0tGsLZGXd2YsjABChEPH7pPPLS7CnTm
 TU+UtlJRzlhncgkJdmlb8xaZyibSXEjKB9b3pd4fVNDotHhyB0wT74fhCqGrDkuRX7vV7yNHq1zr5J
 hr44cNlreYhdVZJm7c3j0JwwYDA2zXpv5B2N5U9pbKtklVtFtNJa4aT0axtw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/mcs_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 16160ca8907f..5c92ba199b90 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -24,9 +24,7 @@
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
 #define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
-} while (0)
+	smp_cond_load_acquire(l, VAL)
 #endif
 
 #ifndef arch_mcs_spin_unlock_contended

From patchwork Sun Mar 16 04:05:20 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018277
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 617CB14F102;
	Sun, 16 Mar 2025 04:05:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097952; cv=none;
 b=oL8sQZ+024vcIBs3EAD0Mm8jID6Cf4Y9UGuksGHz9XYz90iVOFwQJ+d3UezGd4jN+wDg+DWFccS5UKkZZaw5ixXlQIwY95aJUXnqQ7k64dxrCCtabLK1hPE61JHNENir5NCR+jN7y8IBzMG4xODuJ+dqCevt5zRvaddcN5DeXPk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097952; c=relaxed/simple;
	bh=DTAwKIVBPvF1KKfxWrr0m6ASI2nIK7yi8aJrpCgtpuQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=iYTpQUYUJOBOZMa4yZLTJD5kuQ3C9SYidEMi3iIbfopVwggFngbkrjkxaipXeUP/cVDTaU+rxA0ME5dWVA5fu85uWo0dLPdEjPXzD/dXE8POygHEqrIsjos/bqcxo2YfKbb/qtVQlWQe8rLBUHHcavpHUp8Aknpaw8afN5heA6w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=DbFX5/Px; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="DbFX5/Px"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-39141ffa9fcso2847303f8f.0;
        Sat, 15 Mar 2025 21:05:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097948; x=1742702748;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=f2y/CQninyYBUKgOmM8/BKgG1hjtF3cTKSSqa63B7Zo=;
        b=DbFX5/PxYzXxoC9BxaKfyz1tGZ8Q11AuT/Ox9+xo3sJYeGaVMP9zMdFrFdQoECwJ7H
         /KunirXHYXVbzO9lOsOK/9yYmFh8bZakuGlwfuTS8PWDXA9xlqh/MnF+014lQksBSZtk
         r7YmTrz/LmfG42YFvqp7JUunZD2qZ8b2IrgtrDnLclhPY7SaUnwkNjx+1SeABOiGO5Px
         hM1obpJMFmuGxlxKYZESHI1QJwNeUBflgnvAlyUZ7IsV+HSz6D0Vi+jP4eTuJk9lW0iT
         JPFaV76vl4coNiu5pxgj3l1H+P/rkVVnDRHpdDmug8fZo+yV9Opg80sovJDZvazDwYHn
         9Qww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097948; x=1742702748;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=f2y/CQninyYBUKgOmM8/BKgG1hjtF3cTKSSqa63B7Zo=;
        b=wMhKEDw2bIecpT3b9AfuZR/ebsIBxzEYzbXEbOdl+jQt+otpI50tzTOinud/zeSIyV
         dX361iM3frrY443kz/dcMd+LsATQuz8KZ/tnY5cvVk1Wl6VAKQsGTuc1oot8T9dNMHpb
         ldxM5CgMQ6BaaEdhtIg4kGzw4Dij+adMpBhxX2mceS9QrLeooJkMSalGn/z24JkTzecg
         MXUQ5x0wtPSTxW6cc+It4qv+a/Rbc/qXE7oz2zxI70vwhxxrxodMBHOHD+f7o6ACmYZv
         3MmeiLzA4Ky+zMNoiXMH5g5Gi0abYTt9GVwKX+7sGZ4BRoeIG48454k28b4eLXS/gVIv
         Tznw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVKu5WJWwDKh6zczPY8GtixAWyeBWTgJGowrL0GFMBISIVn1FFxastkskces+efe2WlKkj1r+WSVp5iQdA=@vger.kernel.org
X-Gm-Message-State: AOJu0Yx6s6/zaCx386SgWUPreHSiufVPRB3hxMM68laRsOTkWskBiOem
	IKYplAm+J4kbs1F4BYwch0l8j5YVY/aZ1Z4ZLC96AcPFbFXKQa1FwNvLTKBPqTg=
X-Gm-Gg: ASbGnctXlfCzrgx+zPbJC9XxuSNwP/Vy2dqY+45/BYb6LlylL0peSLUM8JhBDz3OV8d
	OZVuaTepoA1DlbaWrwqKa58LBRxF48Q9Wzxxx02CkeIAmriVBEkDeuAqTS28gtdfYuUzAhREAlt
	gRar/HaaAfB9N66GT/XGrOosJjJDkz7XIxjgSnElqpbrcmoQxHguyODJ/qJnhW9p1a16qAzGsgu
	9U0pVp/4xdhweMiRC/ryQ4W99A9oD4Yj1aCkrKi8qKK1QRUVTLjrT87RkpmH5PekadqBe9BnPYs
	7j3WyKsfQ/xeic+8DhKVSj3rSaamGYA5tg==
X-Google-Smtp-Source: 
 AGHT+IFEdFdDXX9mlo+XtsgH+mGwVEJHM88X5q2zA7d0QlYla/iKH6Eki4j8d7sIyrRi/+0SAHZX/g==
X-Received: by 2002:a05:6000:1564:b0:391:487f:2828 with SMTP id
 ffacd0b85a97d-3971cd5741emr10187766f8f.10.1742097948274;
        Sat, 15 Mar 2025 21:05:48 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:6::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb7e9f8asm10778057f8f.81.2025.03.15.21.05.47
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:47 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 04/25] locking: Copy out qspinlock.c to
 kernel/bpf/rqspinlock.c
Date: Sat, 15 Mar 2025 21:05:20 -0700
Message-ID: <20250316040541.108729-5-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=14298; h=from:subject;
 bh=DTAwKIVBPvF1KKfxWrr0m6ASI2nIK7yi8aJrpCgtpuQ=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3blMqF8h+waofaYnPYffWsyIxmJXHoZWK8cilz
 sx2D7EOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyqWDD/
 4iz3IWbiJGJg7NDgqC/uX4CYqzlAw+XBoYpqBZ/clcvmzbcRmbTxPUZ0R9Ie5CE0I0dwwV+EyXzgyB
 hdRetK1UASNu62rCl1VvTLKutqrKZTV0fSgJi0G+Lc+HIXzLFJQN8HOs6021WdsJ3hQVe0Nodfaayc
 rOZ38nVZc1VOkyLt6sZogAB11mcfHkIxiCxNuYR2PeJf+4sZGbYVzCDIcm9tOdXHdlwuSZVs0D/iC/
 7C2qmm5QAFZEqLkwkng+38O8+fvezDqYrCo/9JoXbgxRheezxc7SigIeSWuLGhJHjChmQ9La+Z0GaS
 jq/pHgViyDFaWG6M728MpakUQzHNZc4exWIIWZ4RW7txFmj8jx2VkOZ66ujXVcmsq+WEXKQIKpKbPx
 ixuSnEPKewz1R6iZfbQ1RK3La6Wb7uaoIIwYiE5N+L1mpAisgmWBioo+mP+yYpwX2tBA0ysFM5N3Ai
 pu/r8fsLYaL4NBMM4qwyDdUfYUbjDVc4UbnmoKu2BDyEtvF7ZMoo4Szfav/DmdfmbPsWvdcNvEwl84
 XQKvCfEK4kkUO0GpJFyJjVgSZwvDgSdBQ40EZeT6K62Yh+6HAnW06odP4bjb4LDbUEcIZxlW8u/YuW
 8AQZa8G/aKtb1jxi7Ceo45b2pVKB6qCXKqVJSl5wLSrHXIrRteojWecbpAXA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

Since we place the file in kernel/bpf, include needs to be relative.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary, and rename qnodes to
rqnodes. Let's also use EXPORT_SYMBOL_GPL for rqspinlock slowpath.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 kernel/bpf/rqspinlock.c

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
new file mode 100644
index 000000000000..762108cb0f38
--- /dev/null
+++ b/kernel/bpf/rqspinlock.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ * (C) Copyright 2013-2014,2018 Red Hat, Inc.
+ * (C) Copyright 2015 Intel Corp.
+ * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ *          Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
+#include <linux/smp.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mutex.h>
+#include <linux/prefetch.h>
+#include <asm/byteorder.h>
+#include <asm/qspinlock.h>
+#include <trace/events/lock.h>
+
+/*
+ * Include queued spinlock definitions and statistics code
+ */
+#include "../locking/qspinlock.h"
+#include "../locking/qspinlock_stat.h"
+
+/*
+ * The basic principle of a queue-based spinlock can best be understood
+ * by studying a classic queue-based spinlock implementation called the
+ * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable
+ * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and
+ * Scott") is available at
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=206115
+ *
+ * This queued spinlock implementation is based on the MCS lock, however to
+ * make it fit the 4 bytes we assume spinlock_t to be, and preserve its
+ * existing API, we must modify it somehow.
+ *
+ * In particular; where the traditional MCS lock consists of a tail pointer
+ * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to
+ * unlock the next pending (next->locked), we compress both these: {tail,
+ * next->locked} into a single u32 value.
+ *
+ * Since a spinlock disables recursion of its own context and there is a limit
+ * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there
+ * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now
+ * we can encode the tail by combining the 2-bit nesting level with the cpu
+ * number. With one byte for the lock value and 3 bytes for the tail, only a
+ * 32-bit word is now needed. Even though we only need 1 bit for the lock,
+ * we extend it to a full byte to achieve better performance for architectures
+ * that support atomic byte write.
+ *
+ * We also change the first spinner to spin on the lock bit instead of its
+ * node; whereby avoiding the need to carry a node from lock to unlock, and
+ * preserving existing lock API. This also makes the unlock code simpler and
+ * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *      atomic operations on smaller 8-bit and 16-bit data types.
+ *
+ */
+
+#include "../locking/mcs_spinlock.h"
+
+/*
+ * Per-CPU queue node structures; we can never have more than 4 nested
+ * contexts: task, softirq, hardirq, nmi.
+ *
+ * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
+
+/*
+ * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
+ * for all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+					   struct mcs_spinlock *prev) { }
+static __always_inline void __pv_kick_node(struct qspinlock *lock,
+					   struct mcs_spinlock *node) { }
+static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
+						   struct mcs_spinlock *node)
+						   { return 0; }
+
+#define pv_enabled()		false
+
+#define pv_init_node		__pv_init_node
+#define pv_wait_node		__pv_wait_node
+#define pv_kick_node		__pv_kick_node
+#define pv_wait_head_or_lock	__pv_wait_head_or_lock
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
+/**
+ * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ * @val: Current value of the queued spinlock 32-bit word
+ *
+ * (queue tail, pending bit, lock value)
+ *
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
+ */
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+	struct mcs_spinlock *prev, *next, *node;
+	u32 old, tail;
+	int idx;
+
+	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
+
+	if (pv_enabled())
+		goto pv_queue;
+
+	if (virt_spin_lock(lock))
+		return;
+
+	/*
+	 * Wait for in-progress pending->locked hand-overs with a bounded
+	 * number of spins so that we guarantee forward progress.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	if (val == _Q_PENDING_VAL) {
+		int cnt = _Q_PENDING_LOOPS;
+		val = atomic_cond_read_relaxed(&lock->val,
+					       (VAL != _Q_PENDING_VAL) || !cnt--);
+	}
+
+	/*
+	 * If we observe any contention; queue.
+	 */
+	if (val & ~_Q_LOCKED_MASK)
+		goto queue;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock
+	 */
+	val = queued_fetch_set_pending_acquire(lock);
+
+	/*
+	 * If we observe contention, there is a concurrent locker.
+	 *
+	 * Undo and queue; our setting of PENDING might have made the
+	 * n,0,0 -> 0,0,0 transition fail and it will now be waiting
+	 * on @next to become !NULL.
+	 */
+	if (unlikely(val & ~_Q_LOCKED_MASK)) {
+
+		/* Undo PENDING if we set it. */
+		if (!(val & _Q_PENDING_MASK))
+			clear_pending(lock);
+
+		goto queue;
+	}
+
+	/*
+	 * We're pending, wait for the owner to go away.
+	 *
+	 * 0,1,1 -> *,1,0
+	 *
+	 * this wait loop must be a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because not all
+	 * clear_pending_set_locked() implementations imply full
+	 * barriers.
+	 */
+	if (val & _Q_LOCKED_MASK)
+		smp_cond_load_acquire(&lock->locked, !VAL);
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	clear_pending_set_locked(lock);
+	lockevent_inc(lock_pending);
+	return;
+
+	/*
+	 * End of pending bit optimistic spinning and beginning of MCS
+	 * queuing.
+	 */
+queue:
+	lockevent_inc(lock_slowpath);
+pv_queue:
+	node = this_cpu_ptr(&rqnodes[0].mcs);
+	idx = node->count++;
+	tail = encode_tail(smp_processor_id(), idx);
+
+	trace_contention_begin(lock, LCB_F_SPIN);
+
+	/*
+	 * 4 nodes are allocated based on the assumption that there will
+	 * not be nested NMIs taking spinlocks. That may not be true in
+	 * some architectures even though the chance of needing more than
+	 * 4 nodes will still be extremely unlikely. When that happens,
+	 * we fall back to spinning on the lock directly without using
+	 * any MCS node. This is not the most elegant solution, but is
+	 * simple enough.
+	 */
+	if (unlikely(idx >= _Q_MAX_NODES)) {
+		lockevent_inc(lock_no_node);
+		while (!queued_spin_trylock(lock))
+			cpu_relax();
+		goto release;
+	}
+
+	node = grab_mcs_node(node, idx);
+
+	/*
+	 * Keep counts of non-zero index values:
+	 */
+	lockevent_cond_inc(lock_use_node2 + idx - 1, idx);
+
+	/*
+	 * Ensure that we increment the head node->count before initialising
+	 * the actual node. If the compiler is kind enough to reorder these
+	 * stores, then an IRQ could overwrite our assignments.
+	 */
+	barrier();
+
+	node->locked = 0;
+	node->next = NULL;
+	pv_init_node(node);
+
+	/*
+	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
+	 * attempt the trylock once more in the hope someone let go while we
+	 * weren't watching.
+	 */
+	if (queued_spin_trylock(lock))
+		goto release;
+
+	/*
+	 * Ensure that the initialisation of @node is complete before we
+	 * publish the updated tail via xchg_tail() and potentially link
+	 * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
+	 */
+	smp_wmb();
+
+	/*
+	 * Publish the updated tail.
+	 * We have already touched the queueing cacheline; don't bother with
+	 * pending stuff.
+	 *
+	 * p,*,* -> n,*,*
+	 */
+	old = xchg_tail(lock, tail);
+	next = NULL;
+
+	/*
+	 * if there was a previous node; link it and wait until reaching the
+	 * head of the waitqueue.
+	 */
+	if (old & _Q_TAIL_MASK) {
+		prev = decode_tail(old, rqnodes);
+
+		/* Link @node into the waitqueue. */
+		WRITE_ONCE(prev->next, node);
+
+		pv_wait_node(node, prev);
+		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
+	}
+
+	/*
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
+	 *
+	 * *,x,y -> *,0,0
+	 *
+	 * this wait loop must use a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because the set_locked() function below
+	 * does not imply a full barrier.
+	 *
+	 * The PV pv_wait_head_or_lock function, if active, will acquire
+	 * the lock and return a non-zero value. So we have to skip the
+	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
+	 * been designated yet, there is no way for the locked value to become
+	 * _Q_SLOW_VAL. So both the set_locked() and the
+	 * atomic_cmpxchg_relaxed() calls will be safe.
+	 *
+	 * If PV isn't active, 0 will be returned instead.
+	 *
+	 */
+	if ((val = pv_wait_head_or_lock(lock, node)))
+		goto locked;
+
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+
+locked:
+	/*
+	 * claim the lock:
+	 *
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,*,0 -> *,*,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail)
+	 * and nobody is pending, clear the tail code and grab the lock.
+	 * Otherwise, we only need to grab the lock.
+	 */
+
+	/*
+	 * In the PV case we might already have _Q_LOCKED_VAL set, because
+	 * of lock stealing; therefore we must also allow:
+	 *
+	 * n,0,1 -> 0,0,1
+	 *
+	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
+	 *       above wait condition, therefore any concurrent setting of
+	 *       PENDING will make the uncontended transition fail.
+	 */
+	if ((val & _Q_TAIL_MASK) == tail) {
+		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+			goto release; /* No contention */
+	}
+
+	/*
+	 * Either somebody is queued behind us or _Q_PENDING_VAL got set
+	 * which will then detect the remaining tail and queue behind us
+	 * ensuring we'll see a @next.
+	 */
+	set_locked(lock);
+
+	/*
+	 * contended path; wait for next if not observed yet, release.
+	 */
+	if (!next)
+		next = smp_cond_load_relaxed(&node->next, (VAL));
+
+	arch_mcs_spin_unlock_contended(&next->locked);
+	pv_kick_node(lock, next);
+
+release:
+	trace_contention_end(lock, 0);
+
+	/*
+	 * release the node
+	 */
+	__this_cpu_dec(rqnodes[0].mcs.count);
+}
+EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()	true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head_or_lock
+
+#undef  resilient_queued_spin_lock_slowpath
+#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
+
+#include "../locking/qspinlock_paravirt.h"
+#include "rqspinlock.c"
+
+bool nopvspin;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+#endif

From patchwork Sun Mar 16 04:05:21 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018278
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E98A15573A;
	Sun, 16 Mar 2025 04:05:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097953; cv=none;
 b=DiWsiQKSHsXW5CvSqRPX4v43VqODcEE+P8dufg77Haw9rwiyHeo4bh9i/kXzHQEGO8pQqkGLxD7s2mrA+PNqwf4mMfJQPI7vQc7FlY7VLCV3OqnTXr682zdWJGA5wL9PFW0Vb9nG/Ss7+b4YTpKHlsvJsJe/E/Y8ohnj8bxs7PI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097953; c=relaxed/simple;
	bh=J879+EgbuyTebrgMR7yNECE6EUzDiFxlkQMxluMZnf4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=UcFM4GiCc0JadSDugiMlvK7F7pPY24w+K5NNQAwtUojAIksl4hfGsSpxropqz4HJ8ayV8XeenKzYCy/+RYDdGU1IQiTihzW7E0mhryxXVkWeqBmRtPrxdZgEwSsi35ic8t0XHcRcbhEb//jN1B3K7KVJxHtR0BwaJVFU/HdU0j4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=guzslKKp; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="guzslKKp"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43cf257158fso5957155e9.2;
        Sat, 15 Mar 2025 21:05:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097949; x=1742702749;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=beVjeo/VucQmznYM//yRk+lY9Xj8BUazd5x3dTjfvms=;
        b=guzslKKpD/S/oyM/P+cmWuUldUn3bf2RKdlHNLZsMCfENb+1ya309XsMmLmLpwKL6q
         QQpZbAMPDfgLSBznaX3aQ4IOZOGFuggmMq6k3aziK2JFdSrxkcBqxgzIzX0X5+91PQxR
         e/XpPXK9Onrn4QP07FkreS0JpF5o0GE+yOBLQ8AH2KoOCiGdMyboIJsE8rT8b7MVheaq
         +xFtsynhyPDHz1cu72CC5q5+pO0NUWUUQB6GuTdAlOTyraERi285q149HOv8mkoVkGp5
         BGUCcd4ojKwq00XtaX9h4HE7At/xmlHnaIqKfgmtnLTi5kmBUVsqOjDn+hGHaNGi9hrn
         Sy9A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097949; x=1742702749;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=beVjeo/VucQmznYM//yRk+lY9Xj8BUazd5x3dTjfvms=;
        b=fcjkPTY0QXo0bXlQGYL5c4HQJOdGDA2k4UUcWRJI1AQTgu9aSWn3AetVztCNLf2988
         2V0R4sxeGA5sC/PsASM76KDOml/5KziU1UX+iCadJBs8GJhuPMgwF6Ux9l/fngjdhOzW
         rhVpelmfLyGjmJ0rtjaDOY2Hh91UugrYaQuc/1Rd0lXUHub0W3B2FmJIuIPNTExA2Rme
         aY+ku9C8iNfaK/WDtu1xcsDU2pfqig2TjTx94lHLrNg6IwZjiFLsZhCnIM4Gqh2uBAcP
         StL2YhzjtWAWzMnGgA8oHh7/wFon+6rRjeT5Ekf6zcA5jRBjdpY3z3T4ob7Vf+8CLjzz
         8Jiw==
X-Forwarded-Encrypted: i=1;
 AJvYcCU/wIomca8YGLtrOKLJVYTkyBPujesrNNuSNvypFvjOXMddznBpdULY6nXP5GHe4pM1AHJg/HjphrT/GZM=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxd2Uflv+KPRvks2A5rz5HXrQzv6ss+zqvdg6GPz830dIgV3QYW
	BkUjpd6VfnkO/pfwII7UIAGemhaRiQvj/8zkec6nNao8D4dALJua7f1RBY+oIjk=
X-Gm-Gg: ASbGncs6gJdzNLHcOMcLV9vhWpED2EWaoc27eP1JAXgNUL5S2KrcckpSELQ2v055Qu2
	+UnrcbUL7YvzNFxnKmGRUZIn1GCx1jhHFMsv4rO8wN8i4YVIQtSYFk+UomDb6iLBpm8SaWQ6R3J
	CaWWfm2Gnhh4RXzcLuEvKOufmAqlElBYkqcX/eXjt6qLq468AHBUWHHJ1BA9aKfmpt2cU/Jwny/
	CztWBa1Jb8lA3FMh2rGfbwcLKg9Dvf5i1LPB80OuqjaOgDixZPXGM3Zc1Akb+Py55sg+grb86nQ
	aYuba9DRb2W3cjZVKZ+9W+INsp7iuEYreg==
X-Google-Smtp-Source: 
 AGHT+IHr2XBo0Zj7lg7uqlYGSFDV9q6POPhHGxARC6AThdf9jY9aY0L+6DWSNr/0NWU8352lQtntYA==
X-Received: by 2002:a5d:6da3:0:b0:38d:dd52:1b5d with SMTP id
 ffacd0b85a97d-3971d03eeb0mr7839110f8f.4.1742097949466;
        Sat, 15 Mar 2025 21:05:49 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:9::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fe60b91sm67303265e9.31.2025.03.15.21.05.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:48 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 05/25] rqspinlock: Add rqspinlock.h header
Date: Sat, 15 Mar 2025 21:05:21 -0700
Message-ID: <20250316040541.108729-6-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2286; h=from:subject;
 bh=J879+EgbuyTebrgMR7yNECE6EUzDiFxlkQMxluMZnf4=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3b9elCAd3YQ236M81hMVLjSuOppZtbhch+VA6/
 +rsNVFWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyshkEA
 CCv98+4lY1p9/sFVO0M2tPH7eTtOTASKJhakQbPqh1VQV36TyCzSaQMxrKctCpN8C7owWTC01bHGqc
 uI+Z9UudyPSj0Cm7RYxCrP93Lb3Uayvcj3YxfZ4ECgMrBwLS3EOpGMhRJd60Dcb6XFtUQOMUQvrve4
 cuGnRz6yfsa+vQNLSW1t0H2BODbKRzlFEu1WGD42YiEzjN4knrh/owAlZ1Av8cUPJpxMNzueQASucb
 eGyQ18wsYNVZjVcc/wp4dif0cQLiLc7c+6E9bxeuTzbgRHtAxzROq9WCxfhDOPYLlZe5m/aFW2HojL
 7ilY58wpGW0Rhs3XUzv4sGUT9tj3GcDBl0dm7dPpLB4d0JrX4YkowIj4b/07SIASKj3ovG7wFW+o0m
 5qe+QjTAhKQ1RgSIfkCaRVQGQBhPFqBL5Zf1/JTJ1VmHf0uuxx8/7CvgRM+p5KETo8myeIC5Tt9//M
 B+5VUrfWUAdSF6KWRzKCTFpqBHuJS6T+4JbZX2he0aHaQZOHxX2qn3lDC6VzuHL5BR5XX5LdizGzBV
 f/EiK7VPNweuCM6BurIbBqvfKkiq86dQe15jvoRriwpQuGOlaOaeUayhNQDPOQpAlufShTTNKO/MiT
 z5CdzlZbqINK5Uj9m28uAQkGmSkxfPYss0RS+pN4X5EuZINLNuJyoxi2UMzA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Let's also type alias qspinlock to rqspinlock_t to ensure consistent use
of the new lock type. We want to remove dependence on the qspinlock type
in later patches as we need to provide a test-and-set fallback, hence
begin abstracting away from now onwards.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++
 kernel/bpf/rqspinlock.c          |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 include/asm-generic/rqspinlock.h

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
new file mode 100644
index 000000000000..22f8094d0550
--- /dev/null
+++ b/include/asm-generic/rqspinlock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __ASM_GENERIC_RQSPINLOCK_H
+#define __ASM_GENERIC_RQSPINLOCK_H
+
+#include <linux/types.h>
+
+struct qspinlock;
+typedef struct qspinlock rqspinlock_t;
+
+extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+
+#endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 762108cb0f38..93e31633c2aa 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -23,6 +23,7 @@
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
+#include <asm/rqspinlock.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -127,7 +128,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	u32 old, tail;

From patchwork Sun Mar 16 04:05:22 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018280
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CFFA17BB35;
	Sun, 16 Mar 2025 04:05:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097957; cv=none;
 b=AWviEIbGRlfvxCN9Hs7HoIjVfa7XrSSSb3WrIT92R2Jiha2HBhT+SiGWAlC+4EzqVpOCrHdf+00i00hrZow/gw2xQmcs16ChWshSh54C0vj4FlucqBZRweHEbdiOcnosKOfzxnUHPxbo6FpX1JuuSYmIWJLqts/gnw4OX2gJY4k=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097957; c=relaxed/simple;
	bh=UZ8naMS1hT+7LGCLfEmfT5uEbBX3f9Jyevr2RnHN8ec=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=TdAOZ8rdNJlcwZHU0fw1PWYFvYcmMcxzgF0GzyC5BY6xk0FNpoT8lV2p1VtwdzanIUSIhT/WG8IGfImKmBq7xExPk3TmamKqHtH+Fkp2jnuyB4p60GnERbvJGJYDFfMjQZMDaVDUaT0CBSB131HUEIfGdnfSMq7YeNZVMiRPBK4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Pie7eVnQ; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Pie7eVnQ"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43cfe574976so5640855e9.1;
        Sat, 15 Mar 2025 21:05:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097951; x=1742702751;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Fh+KgT9rgROAc0Y/x+RNT9BL+c7q6+gqaABwWSSeoyU=;
        b=Pie7eVnQFhzNoNfUUN3r9rH/0kdu/cO5NjkwQN88icdGQMn+VrNA6/Ot7d1U7Lfuo0
         50ZC/4DpWuRi6K/0SknNrbd02C7RSxlaAQVqdChI+4dBccT48/AI5OF/ydgveNKSnTQS
         qak85lYMDSgirqnjsUmq15agYlfSUbb/hxCLUKRRDqU2Kw1AwZT165WsX0W8LknCVFKw
         dAisKSszXHfscCXVhgnvtxWwFZ/3wSMYhAU5agSclC50lh9A09opH0hlA9rBzlaJ/ySE
         KdJZsBY/1XE5Omfa03N/nDT2AkagypCJvEV/fHAj/iVEhKnKCQRYDORK/IVKAyCmW/Ag
         8ESg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097951; x=1742702751;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Fh+KgT9rgROAc0Y/x+RNT9BL+c7q6+gqaABwWSSeoyU=;
        b=K1MzxvHkp3LGzN+8WGa9LXp5GehqRE3D0IG80FIQdHVM9bIvA9sawG+BauPL8vJnhw
         rC9xda+yI5Z5JLkyRCRqOeUy+a8vNJ1bu+VU9t5TEDZUodJz/k7hky61xadXyGxknuyc
         wexC+3KkGtYbBfDr5A16GLH5k7rBzHTDynnMssgx/ijqBXrPFKCbIZViMcANM9pLxYd8
         y+4YFWvN4MChEeoMjnWTW/dTgkEai3jwMwJT6wVcbtwM0QeAK6j7HXtiw3ReWQ1tFm5t
         pTc37SljgYzx1D2DJEavSu951YCiazpoxp8+PwoW0I6BAg6VPZ2KCJvS6aPYhzDdVIzK
         z7ig==
X-Forwarded-Encrypted: i=1;
 AJvYcCVEovq9FhB+WEA8abMMR8wSKdvCWEL1mqREKaCtCJDyNjmUmkgLJsNF/JASksJ6tm7MLUW7jjxjWjyGIPI=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxjd5U9oivGLfCVH9LWrV2vwGiLbenqPSkZvSEA9z6wJloCBStO
	5otAwgQUi1qU4km/F3UjjLdoxky5CCjitgXjHvRQfomPaCGRlvWjiGWBnG64pfg=
X-Gm-Gg: ASbGncuaaoE7T/Qb7I53gYDrfjor1a3xgv6z98ZFtXmbskIYxgaThoipAKMSiVj6gpz
	fzKkbWvR7aTryFPmW90uKAE+nYfT8iRGOQZq6V0ePZyJqiNo/oMrxcF51pKG2ENzuprOv/31TpY
	Hz+puvWUVkz+Km6srKDp54B1iDFJk6cBU55CAkE5Qyy6Df163Eg5vNJDqsol8XInxCs6dJ8K3tq
	eYNPKiO5HtJPgT6iU6ImEsjWnEChtb4pMacp1RRjd6O979NpILVH99mkVLBQPSSBt2iKUaONA3v
	ZLxmTLLu8qFxDbGhQrwBkp+yKUipP2HQAQ==
X-Google-Smtp-Source: 
 AGHT+IHzQuqWCQcYRFu+U0l6j9WPow5BG3/fLKT5syvwgj2QS6unu4OTMYwom2NbnxvyXS8VNENPuw==
X-Received: by 2002:a5d:6d02:0:b0:391:2b04:73d9 with SMTP id
 ffacd0b85a97d-3971f511669mr9037415f8f.49.1742097950669;
        Sat, 15 Mar 2025 21:05:50 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:4::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395c7df35f7sm10982359f8f.13.2025.03.15.21.05.50
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:50 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 06/25] rqspinlock: Drop PV and virtualization
 support
Date: Sat, 15 Mar 2025 21:05:22 -0700
Message-ID: <20250316040541.108729-7-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6827; h=from:subject;
 bh=UZ8naMS1hT+7LGCLfEmfT5uEbBX3f9Jyevr2RnHN8ec=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cNxB8Y7ML8wYp3OrW2rUAHgS5cApFt293sVR+
 FUNcGiOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8Ryo0MD/
 97ReRi3PswH9DGAYd8NdunrsJZxOckjHjAmgCpfUiaRZykVSnDzlH/qc51eaWr0NnIeisWZ8UIFUkI
 jzIM3cz8kFKlXhwFkC5OIr8zKUy1nLd1KZ2VvWvTGtfiNK7FcKf4UnaPtm2h7FR7ymOEqkfRwa6Sv4
 KDpMy2fYSfBwlS2zzE8lvtID6NrSQMO0nG9hZI5Vlq1gZCjr6wPcg1I4W8Yg2A3HVdq9TJesAs459g
 +kngV/eR+MZhuO1EKoqAhoyCPZCd04dcd4H0/wCcPMCioozsjnZ0Pa/DN9InNXE1fB2owf7rCR5D5e
 IDZglzG1xyPXgfH8+VJFxlCX5hKTz84v/Ew/3CT6rx8xeARZY2r2Jf9awUQd+pYCqBL8Ugzs+zgQI8
 AuUcemjQLAR/CPa+I8ZnSbCaZMFh10dPUjejN/Iy/kmM0UrW++NXO3QeJwX9dDQBlV/O4GVAYCmH0V
 QmKi1sVhk4Ai7I32j5292bI9eoweCYwn57ZCmo/xRT/DXKexPr/MiFRPmqpaEWk4UY9ls3MvPYS9mZ
 Nj1/QhALuONxN9ys6+jE6eMSXgQ6Bly0zRkRx4vhY/FbDWv43MLGyjayc7Mf/dHyABAf66kMw1z4wM
 PNcxJIsebXTOhOAMPJvATfcKgOhdcWe0Hs5dZLIc37/5kP9BpbmTuPiGZdvg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well.  Therefore, drop it for now.

Note that we need to drop qspinlock_stat.h, as it's only relevant in
case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h
in the includes, which was indirectly pulled in before.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 91 +----------------------------------------
 1 file changed, 1 insertion(+), 90 deletions(-)

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 93e31633c2aa..c2646cffc59e 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -11,8 +11,6 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
-
 #include <linux/smp.h>
 #include <linux/bug.h>
 #include <linux/cpumask.h>
@@ -29,7 +27,7 @@
  * Include queued spinlock definitions and statistics code
  */
 #include "../locking/qspinlock.h"
-#include "../locking/qspinlock_stat.h"
+#include "../locking/lock_events.h"
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -75,38 +73,9 @@
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
- *
- * PV doubles the storage and uses the second cacheline for PV state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 
-/*
- * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
- * for all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
-					   struct mcs_spinlock *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-					   struct mcs_spinlock *node) { }
-static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
-						   struct mcs_spinlock *node)
-						   { return 0; }
-
-#define pv_enabled()		false
-
-#define pv_init_node		__pv_init_node
-#define pv_wait_node		__pv_wait_node
-#define pv_kick_node		__pv_kick_node
-#define pv_wait_head_or_lock	__pv_wait_head_or_lock
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
-
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-	if (pv_enabled())
-		goto pv_queue;
-
-	if (virt_spin_lock(lock))
-		return;
-
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
@@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
-pv_queue:
 	node = this_cpu_ptr(&rqnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	node->locked = 0;
 	node->next = NULL;
-	pv_init_node(node);
 
 	/*
 	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);
 
 		/*
@@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
-	 *
-	 * The PV pv_wait_head_or_lock function, if active, will acquire
-	 * the lock and return a non-zero value. So we have to skip the
-	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
-	 * been designated yet, there is no way for the locked value to become
-	 * _Q_SLOW_VAL. So both the set_locked() and the
-	 * atomic_cmpxchg_relaxed() calls will be safe.
-	 *
-	 * If PV isn't active, 0 will be returned instead.
-	 *
 	 */
-	if ((val = pv_wait_head_or_lock(lock, node)))
-		goto locked;
-
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
 
-locked:
 	/*
 	 * claim the lock:
 	 *
@@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 
 	/*
-	 * In the PV case we might already have _Q_LOCKED_VAL set, because
-	 * of lock stealing; therefore we must also allow:
-	 *
-	 * n,0,1 -> 0,0,1
-	 *
 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
 	 *       above wait condition, therefore any concurrent setting of
 	 *       PENDING will make the uncontended transition fail.
@@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
 	arch_mcs_spin_unlock_contended(&next->locked);
-	pv_kick_node(lock, next);
 
 release:
 	trace_contention_end(lock, 0);
@@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	__this_cpu_dec(rqnodes[0].mcs.count);
 }
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
-
-/*
- * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef  pv_enabled
-#define pv_enabled()	true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#undef  resilient_queued_spin_lock_slowpath
-#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
-
-#include "../locking/qspinlock_paravirt.h"
-#include "rqspinlock.c"
-
-bool nopvspin;
-static __init int parse_nopvspin(char *arg)
-{
-	nopvspin = true;
-	return 0;
-}
-early_param("nopvspin", parse_nopvspin);
-#endif

From patchwork Sun Mar 16 04:05:23 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018279
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 322091714B4;
	Sun, 16 Mar 2025 04:05:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097956; cv=none;
 b=Ms8pp4rHiFpIJqWjqRK/c37uJ1YWzy3OwU3GBjTm6wPQaEsRd7p4ECcGr/HWohBXGtL/CcwzuRjL0MBKK7Lp1mo7s2uv8EjJF85iyIgvd0yMhGb5Al97fV+2dHMKlrTPZP6ce+eOGAaKUgJ+IrZPeHRB6SCGuy5toPHu4diA0Vk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097956; c=relaxed/simple;
	bh=y9/H+/hm8Whgobd7BHik2UQJXH7cEJPayXzZNcl5svk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=T3rlmNqDpBzcCcl8P8lqCgARrT4h0IRbR3/cO3+YhTgKm59a7kRjKU575hPditk6oKGL2x7FN3sRGWDDB77gjbA5FxbaVRv7mRcfZPbygLwzqSqyvLeN7MnsfdmJSHfR3dK7rh1hYrczsznQ4Y9QCpb59rR3u0B3H4NFsXh/kkA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=GeTLCboX; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="GeTLCboX"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43cfe63c592so11225925e9.2;
        Sat, 15 Mar 2025 21:05:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097952; x=1742702752;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=9VRGS9P+jyMhEgNXbehIAgq9LwV+0ThEZLQ6VCSjK78=;
        b=GeTLCboX3FrIIr+Y6zZ4avA7EorZ3gRXzKm4yflCDz1brbDk08VIC/LHqPLBI2Ztc/
         9shAF046wf9pEgNV/Gf0IBrGlHZZ8Yj5Zye9Nv+8o+6LBqrC9GOsp8ut7OSxacJdiBw1
         Sgahu6x5LZzxLw2G+GwCTbHnD0O4ghe5CeRhAwknPbEvdTvjCiUXGd2MBwcmuXc+Ulhd
         QtKgcHoGD/ouU3RZmu50zXwoixvITEpG2sdRy9sf7dyWaGdGEij2FTUZ/d8X9UcWAE7+
         KTWGDLP1qWoE1Tqrrj1/J4qUHuXoTnsjdcVWw94gIF87Iycvk1a2WbAroPrPtBnqw18L
         mOMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097952; x=1742702752;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=9VRGS9P+jyMhEgNXbehIAgq9LwV+0ThEZLQ6VCSjK78=;
        b=XqtQhoJ+OSCD4KlxTj745yNskyp6UZWh109LNgWbYZYTbIiEV/Ylz9hT85myLr130o
         iaLXVbnMzRxqjIEib5qYetPgbg1IgsNbX4mlqk9N+cXeW9BlTRyEFlNjn1Z5D8t1kdzh
         TRfun9Wok3RiOSIsXnQuNR7AWSpELLE4YUloS4rMsrgpvWiEGbN9yW7J9/VGex7um6CW
         oC1FNjFA9HI0fLtvsDl6N0XuOTPV2Jggt2GR7FGxIT4zInUTjm4Kue0iccrAQanzZ6j5
         BwvFLozYpwBn69cSJDJzOaAA9BmCeSU7G1AuMgBya3G6U9nAZC2yYO9miRxc+qW3PX7f
         Oqsw==
X-Forwarded-Encrypted: i=1;
 AJvYcCU3HmqnaPRcKQ/mHgglPLHLBoAeiVTjT2FgHeXnmYBwxQpuVbs98aRZZHB8Hr/NJwTitRBPJ2aykwufTB8=@vger.kernel.org
X-Gm-Message-State: AOJu0YxAy5arXC/EyCqYCPnCmVubSJMlzHfLihp9ddvxfqi23k0kH305
	WIrDFyodH7dXAKACkT/BLefUy8XcYeu3AlA8wIJmEx2o7Gg2+VF+QLgROfagfR0=
X-Gm-Gg: ASbGncv2OAHKFMrIskU1TA3uM2H9lmqfZN4boboaQHXSH6AIF9mzzzYOE1CA5Sj3u6/
	diUNQ/eFiquqChXUNU+FO4I8JjKeXHjNQhCYIQNoAi6qVj4puu8biS2HGEOW3Tyoo4xHrUKcYfG
	IKbV6yvN1HSPeqp/aWkWFMEnQBd/KfyETKk2ucs0hIeOiFUR8GzLlIbNUAlGLkmfYekpgcd68SL
	tP/CiY2mH4OOqAHi1MI1t9Z3wNojgD0QJwZNA9crJPy1DcwBGPgGMuF3A2NLmt5/ass94JUbrCC
	q0U/rR52BTOEbn46yj/O9FL55AiV2kIZtp4=
X-Google-Smtp-Source: 
 AGHT+IFz3AnPPrWkJH2W87H/P68UdDnYxlyg4j1YFp+J6rmRetr1S1beo6FRJwr3WTZ5ZKGh7xpY2g==
X-Received: by 2002:a5d:64e3:0:b0:391:47d8:de25 with SMTP id
 ffacd0b85a97d-3971ee4421fmr11004162f8f.41.1742097952104;
        Sat, 15 Mar 2025 21:05:52 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb318afbsm11186285f8f.72.2025.03.15.21.05.51
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:51 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 07/25] rqspinlock: Add support for timeouts
Date: Sat, 15 Mar 2025 21:05:23 -0700
Message-ID: <20250316040541.108729-8-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4618; h=from:subject;
 bh=y9/H+/hm8Whgobd7BHik2UQJXH7cEJPayXzZNcl5svk=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3c971BRmRrbtZ27c2Isb9sJ77Is1kCbCdZsV6t
 1TF9L6+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RysI2EA
 CWS2QokbV6D30J2uxyFyVfcIaONPfZhICR7q5XYFdtmvsNs0zVjHot9yAnnznptXBAiSS+cpG5Aawg
 z2RsgVOs+pD0nZva8ZR6I0u3fDlBkTQkxzWXNevfnkDzWf6uUiNYOYliNQaW5nWk7Xw7ZhjvCiKRPX
 5A+tArW1n6UuDrD81t1KOunXMvupCUoGslJanF+8gDOt4ww2O9a2OMUAEYlDam8nqkz9cM0CzCd7lj
 QMx+mD23eRdYJPFBaZ0DwuFWjSBsFEv1inLqSt9M2aRZyPyhN91AjHkP7RVP+M9mho4L+KwU5R0cXs
 ze0x9jqRuyREct7IZSep7fJflgyK3LtyZ98G5SfXEQFt45kdMIOrFHPRuBnQm4E6sCs4aJuA0iCmAf
 dZlv2k2m5XIg3dN0guZFl2s1WPD8cB9W0MpgJTmOTnOl4cdR5tCYbuQwVZ8AuFNtt9E5raXpwdig8A
 3cVwVvesOvJ3R3L0nMCpCVvkK2xGt3HgFD3qTNJiFpVZ1DYfR37iPcxY/fvZQlkrC//giZDQ8bcvxJ
 GzzbGGJVJzdeQQcjyz/aBTh+kag7lpu9O1tyMEIIWMI7H0OfiLPjxnjWKbSoAZGBcrjRyr2gPN67vS
 42Su7CQ/jdGZQj58ck+gkVIG4s5JTH4rC9is1NwbNmozjCDyE+65OdLS++Gg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop. The default timeout value defined in the header
(RES_DEF_TIMEOUT) is 0.25 seconds.

This macro will be used as a condition for waiting loops in the slow
path.  Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  6 +++++
 kernel/bpf/rqspinlock.c          | 45 ++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 22f8094d0550..5dd4dd8aee69 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -10,10 +10,16 @@
 #define __ASM_GENERIC_RQSPINLOCK_H
 
 #include <linux/types.h>
+#include <vdso/time64.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
 extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 
+/*
+ * Default timeout for waiting loops is 0.25 seconds
+ */
+#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4)
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index c2646cffc59e..0d8964b4d44a 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -6,9 +6,11 @@
  * (C) Copyright 2013-2014,2018 Red Hat, Inc.
  * (C) Copyright 2015 Intel Corp.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates.
  *
  * Authors: Waiman Long <longman@redhat.com>
  *          Peter Zijlstra <peterz@infradead.org>
+ *          Kumar Kartikeya Dwivedi <memxor@gmail.com>
  */
 
 #include <linux/smp.h>
@@ -22,6 +24,7 @@
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
+#include <linux/timekeeping.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -68,6 +71,45 @@
 
 #include "../locking/mcs_spinlock.h"
 
+struct rqspinlock_timeout {
+	u64 timeout_end;
+	u64 duration;
+	u16 spin;
+};
+
+static noinline int check_timeout(struct rqspinlock_timeout *ts)
+{
+	u64 time = ktime_get_mono_fast_ns();
+
+	if (!ts->timeout_end) {
+		ts->timeout_end = time + ts->duration;
+		return 0;
+	}
+
+	if (time > ts->timeout_end)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+#define RES_CHECK_TIMEOUT(ts, ret)                    \
+	({                                            \
+		if (!(ts).spin++)                     \
+			(ret) = check_timeout(&(ts)); \
+		(ret);                                \
+	})
+
+/*
+ * Initialize the 'spin' member.
+ */
+#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; })
+
+/*
+ * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
+ * Duration is defined for each spin attempt, so set it here.
+ */
+#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; })
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -100,11 +142,14 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
+	struct rqspinlock_timeout ts;
 	u32 old, tail;
 	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	RES_INIT_TIMEOUT(ts);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.

From patchwork Sun Mar 16 04:05:24 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018281
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E02CD181CFD;
	Sun, 16 Mar 2025 04:05:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097957; cv=none;
 b=a3HFG9B66t2gr874erL5+1RWGwh8FRaqxHnEFzYGA7u3E0/OMbw5csBdBIUnWMU7NFn10ZcyZ9caZRyWfqsw7gVcLLpIwj8t+1U2hjBLLFOqMfNU6VS2vY1Ew+7S2qArk1zfzb4Ni7zV2gcyJsROt4WUuJhxTA64csw4Abihovc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097957; c=relaxed/simple;
	bh=lKBJ9aBAw973+or/w6UqktipL8hdcm3y+5hYOVP0ZOk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=H46vlqJvMyUhtgYcGtlAe7olrgyMBlpFOTI8a5Y9x5GlVMZuwziFrhFOjEye6I2AzSlaTARhux5HiD1ZAFvOQPUSKgZgGVKUPvR97U9+sp1KR66w7Em/FAOvpDU0vTAm+vr69M2EKo4fBes3nVVDNLzUr9cKFGTEuvKy1OH+gh8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=M+Ji1RIe; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="M+Ji1RIe"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43d0618746bso7043485e9.2;
        Sat, 15 Mar 2025 21:05:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097953; x=1742702753;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=zFK9QP2ryp6S9vfpdoBhpsIqnMqrZRtjhBu2k3+ZF1k=;
        b=M+Ji1RIeKKXYAXjx0rj32Nh27HypTbunzD9AE4oXM8wJ0fsGgJYyKMoa1/kyf/fqrB
         CTuK2NptxwqNrv2XyaXkBDIlN1wltVeI9aezOv30/v/ngPoYK9j+v6ll9xv/CHYNkN/f
         0Bvtazu27YQaKqtlEFQypuNId8+DwGt8H0Ew3IWJFlaUMqwZP8YggvD3BTudnAPRMDzW
         /AxP6vO33TNzvmUj7QLOFOLbUzqmC9T0vjGbilArQemIelKHPGhg9TeBNM4t2lZ2BcTr
         MgQ06vzJwGZj7IN7uFQsQ3kb/0LNuHpranTGikmQ001QhSkJGQMQ8JhyJJMurw89xlvO
         VG4g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097953; x=1742702753;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=zFK9QP2ryp6S9vfpdoBhpsIqnMqrZRtjhBu2k3+ZF1k=;
        b=RXotwSamRTRiYdp/OFqGWeZ5Pn4WVMYtDmGqJsVEtC1IgAZ4iRRE23ZABqpCT2vJ6S
         Z6GHo/kKD2le9CsPw78vusN0SsrLBRDgNqMEE//EdntwDRA4aaolNU2QZub+HtRgnndY
         sm/lwwzTgnNzkHGTGNwhkuc9W0pGS4ie0slQJ+oEq6VrZBJKZGysll+hV2yYzlSUjB1e
         r2ZV6G4UY6A0I+342QNcoNCXbQyyIVskCHYIKH+Oy3x4sCgIotMehAlGB8ndYuWTDvP4
         pF8AXy7sAuZBSuK+5pm7x/FFZzZd51joBENTyK/4fCcaAKO+GLAV3Q1KBml5bikAWePp
         jncg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWbdZFUoeF3qMxXeea5MSPJNAlz0zcr/1I2iI9bBRl+aAksBzlEJIfHsXEcx+dctnUGojD+eJG7aCaV248=@vger.kernel.org
X-Gm-Message-State: AOJu0YxePdmyibB4vf054k1OjJO5IutACeAUFfbLmCPoVd1FA6bTM9uE
	84Z+aQrBH2yB4yrNOf0/FcEzGSDsLARYqhS4Uv69EJY2pixr4WRVhMmEFQxRD5U=
X-Gm-Gg: ASbGncuF2Vmq3DZF5sCUQgPls65/soL7x/sCQF7CiLTOgk64Vs3yhkHRog/SNCqGK/z
	3Z1G7td6jLRKW9VJuTDRjTKXh1voh/GzcDTZ1PpWGV1mSXS079Y9Wwvj+ROwuD4soPYXhpawFwW
	Gmtx8i3s4faClQTraAvrBffSzIzqgrxkx27SQBQBX9m7ON/P1KSOVIlI+tB0oyCHcuTMVO669lx
	ytZLxDBCmzmpLKLYgFddcmj0kcBb6+mpYIJonYoJDND/wI3rN9pSJqxhIUrc9E45UX25m2ZmcSC
	H+lXrnkRGsQtJg/VcvKBvJieO2DRG92bIA==
X-Google-Smtp-Source: 
 AGHT+IGhUW+f1q1VNdCXhZBFgRFdmfv75hlIFEK2IxoeJXmnqDxRW8wtCa/CmOIyhQqKD2xosxaDyg==
X-Received: by 2002:a05:6000:1aca:b0:390:d6ab:6c49 with SMTP id
 ffacd0b85a97d-397209627cbmr11722558f8f.35.1742097953204;
        Sat, 15 Mar 2025 21:05:53 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:5::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fe60b91sm67304495e9.31.2025.03.15.21.05.52
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:52 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 08/25] rqspinlock: Hardcode cond_acquire loops for
 arm64
Date: Sat, 15 Mar 2025 21:05:24 -0700
Message-ID: <20250316040541.108729-9-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6176; h=from:subject;
 bh=lKBJ9aBAw973+or/w6UqktipL8hdcm3y+5hYOVP0ZOk=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cMbmNKFvx4Ik3/qHmmldYWMH/I0PyXp8ZVeBa
 UGufzLyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RynLiD/
 9hx5tU4NN3EbyQPx9n+eKNGaPXqNZPXxhQCFKmNLAzfKnq/pcezJXwAaBvF3MvIZVzVlUV27hYJOaT
 VPdqeyvrqyrQLXnxKw6ZyYJH2pYpdUil8/2N6BpCLOUHygH+NSQiJ34gEdnURzGDh3RFatKjQifwQj
 lDUDg2B3dBHy3a6sRr+8+0JMw9/DK+acuvbq6yAHzyfOVH7EAeE7PdjvyFOsJ6AfH1Ljn2Ef3zbnuU
 4Kqfx0mCHSNEKwZ0r8BjSLhmMaDqE/5s0iAaMrUyXfyyyNVJOuJXT1uPjOdz+svu+4lrBlI5NJUzSw
 vc3EODS08XuQXKt5+uiNPa+moqvRfm9TtSz/S9Rm6uUPIqM4LPg45KROS4P27UjPZ8jACSk0pQRrk1
 IQrUW4DQinLN6Jf2g6ub0hZHFhXkkBZp9VYetZWXYCyaDyoLnKM2lzdUeRof5bkUQac+W3mWbGSYhs
 1ZNsPxQs2uHgi2xFAHk3Z5vYT46PZIa1+0Dv3eLErjz6D62AmpZar3lhqeQ7Ta7XuCzKvDvHNqyisy
 aDrHLSUt2eVz4sbmWW6daJ4kEbOdjsCeHybPzY3n20R7UPwcoiGQld4ZgBGXkP06x2uAvOfgJMIG0M
 9QbNJoAG58//G++hEQlk7Pm0zJfn+jEgToyvLCWCL5AFtls4uNh/Ru6sTjkA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Currently, for rqspinlock usage, the implementation of
smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are
susceptible to stalls on arm64, because they do not guarantee that the
conditional expression will be repeatedly invoked if the address being
loaded from is not written to by other CPUs. When support for
event-streams is absent (which unblocks stuck WFE-based loops every
~100us), we may end up being stuck forever.

This causes a problem for us, as we need to repeatedly invoke the
RES_CHECK_TIMEOUT in the spin loop to break out when the timeout
expires.

Let us import the smp_cond_load_acquire_timewait implementation Ankur is
proposing in [0], and then fallback to it once it is merged.

While we rely on the implementation to amortize the cost of sampling
check_timeout for us, it will not happen when event stream support is
unavailable. This is not the common case, and it would be difficult to
fit our logic in the time_expr_ns >= time_limit_ns comparison, hence
just let it be.

  [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com

Cc: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/arm64/include/asm/rqspinlock.h | 93 +++++++++++++++++++++++++++++
 kernel/bpf/rqspinlock.c             | 15 +++++
 2 files changed, 108 insertions(+)
 create mode 100644 arch/arm64/include/asm/rqspinlock.h

diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..5b80785324b6
--- /dev/null
+++ b/arch/arm64/include/asm/rqspinlock.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RQSPINLOCK_H
+#define _ASM_RQSPINLOCK_H
+
+#include <asm/barrier.h>
+
+/*
+ * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom
+ * version based on [0]. In rqspinlock code, our conditional expression involves
+ * checking the value _and_ additionally a timeout. However, on arm64, the
+ * WFE-based implementation may never spin again if no stores occur to the
+ * locked byte in the lock word. As such, we may be stuck forever if
+ * event-stream based unblocking is not available on the platform for WFE spin
+ * loops (arch_timer_evtstrm_available).
+ *
+ * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this
+ * copy-paste.
+ *
+ * While we rely on the implementation to amortize the cost of sampling
+ * cond_expr for us, it will not happen when event stream support is
+ * unavailable, time_expr check is amortized. This is not the common case, and
+ * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns
+ * comparison, hence just let it be. In case of event-stream, the loop is woken
+ * up at microsecond granularity.
+ *
+ * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
+ */
+
+#ifndef smp_cond_load_acquire_timewait
+
+#define smp_cond_time_check_count	200
+
+#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns,	\
+					 time_limit_ns) ({		\
+	typeof(ptr) __PTR = (ptr);					\
+	__unqual_scalar_typeof(*ptr) VAL;				\
+	unsigned int __count = 0;					\
+	for (;;) {							\
+		VAL = READ_ONCE(*__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		cpu_relax();						\
+		if (__count++ < smp_cond_time_check_count)		\
+			continue;					\
+		if ((time_expr_ns) >= (time_limit_ns))			\
+			break;						\
+		__count = 0;						\
+	}								\
+	(typeof(*ptr))VAL;						\
+})
+
+#define __smp_cond_load_acquire_timewait(ptr, cond_expr,		\
+					 time_expr_ns, time_limit_ns)	\
+({									\
+	typeof(ptr) __PTR = (ptr);					\
+	__unqual_scalar_typeof(*ptr) VAL;				\
+	for (;;) {							\
+		VAL = smp_load_acquire(__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		__cmpwait_relaxed(__PTR, VAL);				\
+		if ((time_expr_ns) >= (time_limit_ns))			\
+			break;						\
+	}								\
+	(typeof(*ptr))VAL;						\
+})
+
+#define smp_cond_load_acquire_timewait(ptr, cond_expr,			\
+				      time_expr_ns, time_limit_ns)	\
+({									\
+	__unqual_scalar_typeof(*ptr) _val;				\
+	int __wfe = arch_timer_evtstrm_available();			\
+									\
+	if (likely(__wfe)) {						\
+		_val = __smp_cond_load_acquire_timewait(ptr, cond_expr,	\
+							time_expr_ns,	\
+							time_limit_ns);	\
+	} else {							\
+		_val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr,	\
+							time_expr_ns,	\
+							time_limit_ns);	\
+		smp_acquire__after_ctrl_dep();				\
+	}								\
+	(typeof(*ptr))_val;						\
+})
+
+#endif
+
+#define res_smp_cond_load_acquire_timewait(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_RQSPINLOCK_H */
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 0d8964b4d44a..d429b923b58f 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -92,12 +92,21 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	return 0;
 }
 
+/*
+ * Do not amortize with spins when res_smp_cond_load_acquire is defined,
+ * as the macro does internal amortization for us.
+ */
+#ifndef res_smp_cond_load_acquire
 #define RES_CHECK_TIMEOUT(ts, ret)                    \
 	({                                            \
 		if (!(ts).spin++)                     \
 			(ret) = check_timeout(&(ts)); \
 		(ret);                                \
 	})
+#else
+#define RES_CHECK_TIMEOUT(ts, ret, mask)	      \
+	({ (ret) = check_timeout(&(ts)); })
+#endif
 
 /*
  * Initialize the 'spin' member.
@@ -118,6 +127,12 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 
+#ifndef res_smp_cond_load_acquire
+#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c)
+#endif
+
+#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c))
+
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure

From patchwork Sun Mar 16 04:05:25 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018282
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBAEE188734;
	Sun, 16 Mar 2025 04:05:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097958; cv=none;
 b=h4VMht2pProLtiD60TkM9RDkXO9EvDFQkMYPqp2PQUr6YY0sgaG0prDXsflarJumWbWQ5VKQi+XaCgxZZo/1kfdVDS6xGVGAg6vJ/0BPyfsMwmJxvia5Zbq5yiNuwP1aIbk3zzQWaTdcxe1NsR01QQhUT4Yzo8e3qwr6bn1s83o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097958; c=relaxed/simple;
	bh=akh5yhXfrEG7tkbyObFzFj7wBccI2uw5HAmi9Ua2XD8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=GTazneyGhAhRlDZVE4Tg5nz8SQ4j9CZSxRyCUu64Vl+2p3oaRgvjjJD1jj7Jx1p8QhHdBMQhQNo6vxwCV/rtaypqnIauUi7rZFn0n4UDdRJHHIjMV6m8l06JyNWpgEs3qpkQWFjFQ9DeS1fBvkge+5oKBREKsJ2sNJvbQooyojE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Vw5yRFKh; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Vw5yRFKh"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4394a823036so9985645e9.0;
        Sat, 15 Mar 2025 21:05:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097955; x=1742702755;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=DZSlxIaXESbScuJYNNT7oHLcn/yU5dVEPTh1HXSGjKY=;
        b=Vw5yRFKhKi4Ns1GT3sI1uUIX9p+aMtEXIASkFHlkb7anms20qR+cbk+jtldG7Y8laY
         PwG1rbiatiS7sK0TBsArqGX3YQ95Apu1E9gs+axOF6HantmWcFCkJi5Hx5yNpdDzf/Uu
         rO56+EtnnDwl6ObUbCrvE+2mGRsfKAJpY+iCC7S5uRpYMCvCIIe6qyCIf0Ambnjd44b6
         PLcnaV9cPYh5PbJCPp/21KLvApI4mfQ8d83AmU5zTVnlYQpGJe38qIuJ7U/wejDnVl9G
         maiVSCg8ODZZiNC+PwmM/w46bZmemZ8S9/7+fdNBiRyCFJTPeH9uKTw6VzZX+VwcFUrO
         MgmA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097955; x=1742702755;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=DZSlxIaXESbScuJYNNT7oHLcn/yU5dVEPTh1HXSGjKY=;
        b=W520b0hZV0gWTUihV+7+ZJ1dJotUWg9KfuAil/9eEBuyllJIdHHveF/vjM6NsRk0hM
         KgrvMj1q96tdWVGLp9sOaA3nBcGp5HVfYdj1eg1MHstUrZIQ86T18vnpAsBbBZlZQF/D
         ghff3Nil2+4i9T1KxYtXeaQO4a/91A7Bqzl6OwbrVdhxBULcdnMxG3O3t/osi4yE3ZQH
         RN77W6gbviCcHmSUsnN1Pfx8tLAAAX3e2CmSIqJTgx8ACTLlZPXj8MM9yC+hD0CLFeQZ
         F5OIDI8uZoWYkwSKCZtGaty6+I712xkadHS0HgvGeA0HEw713rjQCVDD/Hg/wSV4WIXN
         ZE7Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCVUjkGyxJ+8Skz1I411YPKazmh76L69em83h2gBbKoApinQe6sRBXXDPMBWEY/I7pcyNLpwV0rkb0YNuAs=@vger.kernel.org
X-Gm-Message-State: AOJu0Ywfpd8dNZGYPkLv8wwjW5x1Gq5VJPmRKZgI6G5eH831Z5h25vwG
	Kp6y9SnIbtC89CEjPaH4+c7R4e3MDOuWsBCdZS44u29XdhimHsX8nl3F2ZiPrtk=
X-Gm-Gg: ASbGncs0VpVWnO7w3bzxWuB/AOdGY+fTTtTx4pJMOy37TV/47XejkjiK3nPET+zTWDC
	H2UPN7NCMh+8bdeVps4Pm0OsVx4NmoTya9UmhFI6NhvSO88Vr6oPqvGfrg8WCF2g7ANPjr8d/es
	Sg8MnP7hn/QALBoxa+7ZNU/Zs+RA9Zl/KnG8li+btdrfiUYOEPBhIMC4hM4NPoLA8VTZCd0Rq6+
	OYxwWFTM7klCQmVRdn1oV4tDLdwCU2zSZ28cmkp7CyXwBq+Ecu91bWqhqGTqe/4hn8jDOFWRD1F
	vXCVt354Ns8Ha7IZJ8LbbTb/nraJhYHEESo=
X-Google-Smtp-Source: 
 AGHT+IGXqKHvuoBJmJkfw5Qx4e5pyHmiy2Na/MLB4Z1m2JlcO42ipiA1BkRJPFk9pYLAuBnL9gK5Ng==
X-Received: by 2002:a5d:588b:0:b0:391:1213:9475 with SMTP id
 ffacd0b85a97d-3971d8021e7mr8853463f8f.24.1742097954718;
        Sat, 15 Mar 2025 21:05:54 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:44::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395c83b6a5esm10682902f8f.27.2025.03.15.21.05.53
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:53 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 09/25] rqspinlock: Protect pending bit owners from
 stalls
Date: Sat, 15 Mar 2025 21:05:25 -0700
Message-ID: <20250316040541.108729-10-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5013; h=from:subject;
 bh=akh5yhXfrEG7tkbyObFzFj7wBccI2uw5HAmi9Ua2XD8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3c4jSnmA5ONpoVoAIOIOlsxwJqGlCaF71ZqrDS
 9f300ICJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8Ryj+AD/
 9QOKlJQsL0hfRsxlnp+LcUhoinUUhuf5PSUgCRTOdkG5asYnxKe4RE+k6QEZqq50n7HbKHjFsdfrSv
 5px/fRWV280XEsplziWpBm3nu9/uGIn06D/6DArjzNdiOULmNoKhIVTCdpvva8L3HI7GSTxZani9n6
 MMd1j6C8TM9+oKJsg31rq03joUCbrTzttht7zSk0WcHYwGmbbQNFYepfby2aIG6QR4whRiRfFigmjD
 1nOdu68WNCp+zFeB5UwD1SKNXHXAiI47mTVfHATyZW2gO+qoVjxj0jcjEQISr2U7ilsZqhtUV4qu79
 0rmI1TdWYDffcf35yEPkekzca8OUaen5puFYbpAn2C5IQghtsvhiDeLpI+qrW+Dh9VrVLfJeZb4PA/
 CmKe2nkK73+vP66V/atyI3Nop6a65yKO4dVVKbSm3r6dXZdycqN3CO6uWPWXasVJOnEwagR0GKKcHV
 5lNkIep13oO1YQkgmpMqkC4RCf6IhbIwCFAm6j9Mt+0TJyAB8WHcAIwetF6IDcyMBK5R+DiMlKafao
 b9h8GkivtSiN4L7TcMtqek4PinmZ+hoaj24dkKmwfoF1g7gqEI1D1SHHR9nGA/BtBKsGojUhnVbY1O
 sUkCihUbg8MADno6l9S45vaZux7YwSBu0+kqCFmosGJEUxcqTgLWNTqfWjaA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

The pending bit is used to avoid queueing in case the lock is
uncontended, and has demonstrated benefits for the 2 contender scenario,
esp. on x86. In case the pending bit is acquired and we wait for the
locked bit to disappear, we may get stuck due to the lock owner not
making progress. Hence, this waiting loop must be protected with a
timeout check.

To perform a graceful recovery once we decide to abort our lock
acquisition attempt in this case, we must unset the pending bit since we
own it. All waiters undoing their changes and exiting gracefully allows
the lock word to be restored to the unlocked state once all participants
(owner, waiters) have been recovered, and the lock remains usable.
Hence, set the pending bit back to zero before returning to the caller.

Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
event statistics.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h  |  2 +-
 kernel/bpf/rqspinlock.c           | 32 ++++++++++++++++++++++++++-----
 kernel/locking/lock_events_list.h |  5 +++++
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 5dd4dd8aee69..9bd11cb7acd6 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -15,7 +15,7 @@
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
-extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 
 /*
  * Default timeout for waiting loops is 0.25 seconds
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index d429b923b58f..262294cfd36f 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -138,6 +138,10 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
  * @lock: Pointer to queued spinlock structure
  * @val: Current value of the queued spinlock 32-bit word
  *
+ * Return:
+ * * 0		- Lock was acquired successfully.
+ * * -ETIMEDOUT - Lock acquisition failed because of timeout.
+ *
  * (queue tail, pending bit, lock value)
  *
  *              fast     :    slow                                  :    unlock
@@ -154,12 +158,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
+int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	struct rqspinlock_timeout ts;
+	int idx, ret = 0;
 	u32 old, tail;
-	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
@@ -217,8 +221,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		smp_cond_load_acquire(&lock->locked, !VAL);
+	if (val & _Q_LOCKED_MASK) {
+		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
+		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+	}
+
+	if (ret) {
+		/*
+		 * We waited for the locked bit to go back to 0, as the pending
+		 * waiter, but timed out. We need to clear the pending bit since
+		 * we own it. Once a stuck owner has been recovered, the lock
+		 * must be restored to a valid state, hence removing the pending
+		 * bit is necessary.
+		 *
+		 * *,1,* -> *,0,*
+		 */
+		clear_pending(lock);
+		lockevent_inc(rqspinlock_lock_timeout);
+		return ret;
+	}
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -227,7 +248,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	clear_pending_set_locked(lock);
 	lockevent_inc(lock_pending);
-	return;
+	return 0;
 
 	/*
 	 * End of pending bit optimistic spinning and beginning of MCS
@@ -378,5 +399,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * release the node
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..c5286249994d 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
 LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
 #endif /* CONFIG_QUEUED_SPINLOCKS */
 
+/*
+ * Locking events for Resilient Queued Spin Lock
+ */
+LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
+
 /*
  * Locking events for rwsem
  */

From patchwork Sun Mar 16 04:05:26 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018283
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F8F718B492;
	Sun, 16 Mar 2025 04:05:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097960; cv=none;
 b=RKVmg6ZB1JSWcNojYvEF3gzHVvQo7hnMvbLhanmICm5+C5n7JzW1VjNhT+y8FrA/So/kvlYroGjzCR+VJTCCckz6CRweHlIF3O/lQKg68LzNX6bf9ouwkIAJlfIR/B1lD1bA9RuntfEJtPmnhgtofMdffjnsGPno1tRDEVFjOg4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097960; c=relaxed/simple;
	bh=vVjQGLLHhTQCMBetlOFIdU+ojPl1vLkRKHm30dVzCfs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=pZXJbbx42dMxZFQ1zM3WH9SqUDy6mON+sH6Lmg+uGoqqmHb0K7UZmeOrXp9cZjhxnP98yMW6KDjiCRYONg+XzXTXShA/xMGLIE9uh0osq161JSfuOlD9oc00sf8poIByjLBn7UjZa+Pwuc8wdhT1px7v64f3UOhwQmtKKz0upts=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=nk6GSHlv; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="nk6GSHlv"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43cfe63c592so11226175e9.2;
        Sat, 15 Mar 2025 21:05:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097956; x=1742702756;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=P9oRTQ+VV5E61dUbGRZRpYYl6d2ZXF0CN+Od4Rtwxwc=;
        b=nk6GSHlvKbWwVx1zEuQA5zZ2u8ephBBJwdkVl1UWkmR9DTRaOMOL1jnW14OdLVLc0g
         391p5IwwNd8EvvgJ8PVsihAynq+C5potCepvnS0KqmWPueTOpwfWBP/GHJaCrp8y/rko
         rh+pylqbk2B+EEFF0p5snCqBY0g2D4vVePlDsLaVIXi5uwZPUfIKMBRHto5Lo5ECnzbz
         wdwtbMpC6U/8O7lZQmF2vYgvwDvpgXw1pD7WlXvXCzgExa/S7LZpUoHzzV6Jogx7Jt8a
         dNegHP8q+58wUyb6yaVUcAW8lN+EirygTqCiOK0DZ3ckkQH9oju67DsPGwoKmDGkfxXp
         rObA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097956; x=1742702756;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=P9oRTQ+VV5E61dUbGRZRpYYl6d2ZXF0CN+Od4Rtwxwc=;
        b=IAPSZi0V5yZCqH7u4wG1hqBjHUNpDZjvuylZVjgmQIeW4TGcxFhpZdg5I7KvR6iwcx
         XHDoo/ZySbAigvXTT+MctlbLarZXkeEtBEwx3YKCKKXmN0S2wkFMEA/mU42VljcxieD4
         XVdRXXSlyDPhcspnA7Zsv5iVxEMr/9LzDRBFFf2vO+A/LOEKRLpNijxDOb5vtX+bzxxu
         TpHAG5zPtCcOWfPgneva6pGdhRXh9e7+hm/ptYXdXclNrmejDIkwE3D3xMCHvFINk1Mr
         ///xykqyVimCjjOfBzVApPpCFCyRV31z2m3luDHlU+U3bU9Jl2Rpg36hYpgjDh/b41I0
         q02A==
X-Forwarded-Encrypted: i=1;
 AJvYcCXrq+/TrUgbEPOj8i96LGY/cRkpCQJ0LDWAj7rGgxp8jx2Mbo3QIQ+vbyBQpndEqLZZSG6hUedatfCPJ8Q=@vger.kernel.org
X-Gm-Message-State: AOJu0YwEB9jdtJXMMI9B7VsASohbVITaxp/3Mt0r+t+zQWCmOWTtDhSJ
	P63XuIEfOyKOTV+Axh0SDVKZ64oFdXEcyYtbGwX2Aeq4AdyV7M60NhCgBCyBFrA=
X-Gm-Gg: ASbGncug3ZGk0FYr3rFF7Dc/izXHrpn44T6KoMHjqiQpPfFNsPjK7Vnunh3REa03Jvz
	mbAb8tCx7puz+lDYB3P7blV3en7Co9hfcdNKDv/KrX5Y6oY2SRuOCAuW23vz3hP90K4LQSwSwRq
	3Zj3QHwU0SVsfgIMiZkDAr/TFAhxaMDLyOtSYjm3+AuitrowOaBaD09h+tqouU6uikgSmbeW9+Z
	QkTHZTJq4u+rSdCEltC7xnxnOcJpchsLnUg4w2f/3Y+ZOBJWdVlUP4n1zUcKZE5IqrpAzNP3X+f
	qRwRDR6OsJPs9teBKEI3tzWlLTOtLIsLNJQ=
X-Google-Smtp-Source: 
 AGHT+IFkEauTHl/obyRmNhRmy/pJEBw/VAi/+QiZFNK10Y71rJhRLvMF1sFfj0C34r6TkaQ2AWy76g==
X-Received: by 2002:a05:600c:548e:b0:43d:b32:40aa with SMTP id
 5b1f17b1804b1-43d1ec72a60mr95473375e9.3.1742097956021;
        Sat, 15 Mar 2025 21:05:56 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:71::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb40fab8sm11240358f8f.63.2025.03.15.21.05.55
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:55 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 10/25] rqspinlock: Protect waiters in queue from
 stalls
Date: Sat, 15 Mar 2025 21:05:26 -0700
Message-ID: <20250316040541.108729-11-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=8535; h=from:subject;
 bh=vVjQGLLHhTQCMBetlOFIdU+ojPl1vLkRKHm30dVzCfs=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cSLlo4tmtO0ItN2MrwskyKIwLLs6w+ebaLBXo
 7AIxPteJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RysmSD/
 4gAxpbVRAMKnrlkkFfN4Ga5MkaT9kFKIB75NzYnTksK7CH8hhGDYv+JLi+RdymiKCyY9nzzCusLCvH
 DgKiy8dOd7jd4kK2asNPQv83NeMlK0Y7ez2xIW0aheacqs2xHVRzHNpXJrhlXMFk9AOjed7lRqaZkk
 MHYjtNrFJYw2buxWBArDplbtplJ6ZFnH/R4X9150luydwS58JO6v1dpAXhDRtZug46Mf3bEhF2OxqC
 jOYtEwXKFE02GDoCLR+Ux1Tq9HAULxytmj+cG4B/5lN3ArOwmML81960DEOmDRtiBuM13th+wrnsBv
 j5ht8UeNorxeqAqlS0DKTVL5GwgKQ0J/gMt567PrzjsdttH9cQ2U+GS1RvleETvs2fRov1kvHQlCBh
 zWbay++IImUzfsbrUbJsWtbQz5zd7WoeMVXwycELimgnu6pMDDfOmXnPyktSfkP+Y2afRyImMjuGmz
 onfCAORf8i939MF0Z78DKXqnm5ooMaXPCja7iWalvBWDsRQVJXMA4eNMfbKC582zqFpuWWwa5V8NWm
 VhoIH2++de0gRcjcmh/Wg9q2PogVDyiUpdPW0390l2OIviAlmCmDbOMeMMvmSAOonJmYnfqgmklT82
 erZE+jj8vv/93kMS/3ldvy0UA/t4FMfRnggOei1u5ghurEjEm+jmBceT6Qrg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Implement the wait queue cleanup algorithm for rqspinlock. There are
three forms of waiters in the original queued spin lock algorithm. The
first is the waiter which acquires the pending bit and spins on the lock
word without forming a wait queue. The second is the head waiter that is
the first waiter heading the wait queue. The third form is of all the
non-head waiters queued behind the head, waiting to be signalled through
their MCS node to overtake the responsibility of the head.

In this commit, we are concerned with the second and third kind. First,
we augment the waiting loop of the head of the wait queue with a
timeout. When this timeout happens, all waiters part of the wait queue
will abort their lock acquisition attempts. This happens in three steps.
First, the head breaks out of its loop waiting for pending and locked
bits to turn to 0, and non-head waiters break out of their MCS node spin
(more on that later). Next, every waiter (head or non-head) attempts to
check whether they are also the tail waiter, in such a case they attempt
to zero out the tail word and allow a new queue to be built up for this
lock. If they succeed, they have no one to signal next in the queue to
stop spinning. Otherwise, they signal the MCS node of the next waiter to
break out of its spin and try resetting the tail word back to 0. This
goes on until the tail waiter is found. In case of races, the new tail
will be responsible for performing the same task, as the old tail will
then fail to reset the tail word and wait for its next pointer to be
updated before it signals the new tail to do the same.

We terminate the whole wait queue because of two main reasons. Firstly,
we eschew per-waiter timeouts with one applied at the head of the wait
queue.  This allows everyone to break out faster once we've seen the
owner / pending waiter not responding for the timeout duration from the
head.  Secondly, it avoids complicated synchronization, because when not
leaving in FIFO order, prev's next pointer needs to be fixed up etc.

Lastly, all of these waiters release the rqnode and return to the
caller. This patch underscores the point that rqspinlock's timeout does
not apply to each waiter individually, and cannot be relied upon as an
upper bound. It is possible for the rqspinlock waiters to return early
from a failed lock acquisition attempt as soon as stalls are detected.

The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
race with a concurrent xchg and a non-head waiter linking its MCS node
to the head's MCS node through 'prev->next' assignment.

One notable thing is that we must use RES_DEF_TIMEOUT * 2 as our maximum
duration for the waiting loop (for the wait queue head), since we may
have both the owner and pending bit waiter ahead of us, and in the worst
case, need to span their maximum permitted critical section lengths.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 55 ++++++++++++++++++++++++++++++++++++++---
 kernel/bpf/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100644 kernel/bpf/rqspinlock.h

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 262294cfd36f..65c2b41d8937 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -77,6 +77,8 @@ struct rqspinlock_timeout {
 	u16 spin;
 };
 
+#define RES_TIMEOUT_VAL	2
+
 static noinline int check_timeout(struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -325,12 +327,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
+		int val;
+
 		prev = decode_tail(old, rqnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		arch_mcs_spin_lock_contended(&node->locked);
+		val = arch_mcs_spin_lock_contended(&node->locked);
+		if (val == RES_TIMEOUT_VAL) {
+			ret = -EDEADLK;
+			goto waitq_timeout;
+		}
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -353,8 +361,49 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
+	 *
+	 * We use RES_DEF_TIMEOUT * 2 as the duration, as RES_DEF_TIMEOUT is
+	 * meant to span maximum allowed time per critical section, and we may
+	 * have both the owner of the lock and the pending bit waiter ahead of
+	 * us.
 	 */
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+	RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
+	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+					   RES_CHECK_TIMEOUT(ts, ret));
+
+waitq_timeout:
+	if (ret) {
+		/*
+		 * If the tail is still pointing to us, then we are the final waiter,
+		 * and are responsible for resetting the tail back to 0. Otherwise, if
+		 * the cmpxchg operation fails, we signal the next waiter to take exit
+		 * and try the same. For a waiter with tail node 'n':
+		 *
+		 * n,*,* -> 0,*,*
+		 *
+		 * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is
+		 * possible locked/pending bits keep changing and we see failures even
+		 * when we remain the head of wait queue. However, eventually,
+		 * pending bit owner will unset the pending bit, and new waiters
+		 * will queue behind us. This will leave the lock owner in
+		 * charge, and it will eventually either set locked bit to 0, or
+		 * leave it as 1, allowing us to make progress.
+		 *
+		 * We terminate the whole wait queue for two reasons. Firstly,
+		 * we eschew per-waiter timeouts with one applied at the head of
+		 * the wait queue.  This allows everyone to break out faster
+		 * once we've seen the owner / pending waiter not responding for
+		 * the timeout duration from the head.  Secondly, it avoids
+		 * complicated synchronization, because when not leaving in FIFO
+		 * order, prev's next pointer needs to be fixed up etc.
+		 */
+		if (!try_cmpxchg_tail(lock, tail, 0)) {
+			next = smp_cond_load_relaxed(&node->next, VAL);
+			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
+		}
+		lockevent_inc(rqspinlock_lock_timeout);
+		goto release;
+	}
 
 	/*
 	 * claim the lock:
@@ -399,6 +448,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * release the node
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/bpf/rqspinlock.h b/kernel/bpf/rqspinlock.h
new file mode 100644
index 000000000000..5d8cb1b1aab4
--- /dev/null
+++ b/kernel/bpf/rqspinlock.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock defines
+ *
+ * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __LINUX_RQSPINLOCK_H
+#define __LINUX_RQSPINLOCK_H
+
+#include "../locking/qspinlock.h"
+
+/*
+ * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value
+ * @lock: Pointer to queued spinlock structure
+ * @tail: The tail to compare against
+ * @new_tail: The new queue tail code word
+ * Return: Bool to indicate whether the cmpxchg operation succeeded
+ *
+ * This is used by the head of the wait queue to clean up the queue.
+ * Provides relaxed ordering, since observers only rely on initialized
+ * state of the node which was made visible through the xchg_tail operation,
+ * i.e. through the smp_wmb preceding xchg_tail.
+ *
+ * We avoid using 16-bit cmpxchg, which is not available on all architectures.
+ */
+static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		/*
+		 * Is the tail part we compare to already stale? Fail.
+		 */
+		if ((old & _Q_TAIL_MASK) != tail)
+			return false;
+		/*
+		 * Encode latest locked/pending state for new tail.
+		 */
+		new = (old & _Q_LOCKED_PENDING_MASK) | new_tail;
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return true;
+}
+
+#endif /* __LINUX_RQSPINLOCK_H */

From patchwork Sun Mar 16 04:05:27 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018284
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8DD9914F102;
	Sun, 16 Mar 2025 04:05:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097961; cv=none;
 b=apwHBw49QHUXe6rw4cYXrtNmpRTLEBHfFR1XZAWVGDr9n0i1ZrPDCS2IyrGbJ3QXk43vILEohK6yqewP5Hxp3hYDWBG7xsR54rlP2y3Tmx46LfnywfKh8Oncl5HKQv4HNGJWpbL2X7zV2OE2VtwFXZBepuhq2nOVdbb8R3NXnYc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097961; c=relaxed/simple;
	bh=6ga9NaPnm2qscV5e908A787vKJcwTZL0L+jNyYJdaTA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=h0y9xXGNV/r8+VDhuh6jJiDrgvOIQPWs2vI/Za1hGKQUiYGG6MjBOb/Aeu980rGNuPpSJXgNTUHJwVceswJJebnXgnR6I2uR5NfqwoIo49ZlrX/sYEvjoTgw+R5jjCccru6/FlkDkViPm1AhVfXVZJFUcr689bXGrb5pqxyOd2w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=JNllPUhV; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="JNllPUhV"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43690d4605dso6607265e9.0;
        Sat, 15 Mar 2025 21:05:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097957; x=1742702757;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=HX6HuMB01wWAUhOciNQt+DcJpa4mtBn0KQKhBNdUTd8=;
        b=JNllPUhVLWELcCTtna3ahdsmRnct2xRVLLy0D/CUxAzKu4MDQVOQKWNGNk5jCzef37
         qhvOXUm53Aqcd2/pE+5xFzeNGAeq5nXj02XN49m/ooMbuJOQnmEDNOx/JIqlAwDdDbIr
         Z5rxSG6yWI+ziFHTnsHOe2pnedopDiqED1JOXhTADEuSoXsFlHkElmg30UfJ0/eCippA
         YmwNuj4DVWG6DC2N1e2ts+txu8AJuJcvOQF3gmeUpBcJ6qwAdYxfydV9IZoqz7o6mDMd
         6MlCK47BWKDec5dLvPJuMyBv24F4rsnTRjQCwUmkGhqrEC1qCA7n7suv9rQGpMyKR4Ax
         Dffw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097957; x=1742702757;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=HX6HuMB01wWAUhOciNQt+DcJpa4mtBn0KQKhBNdUTd8=;
        b=HmEnzM9pE4VRooKCAvFnkd5jDW8ZnRigidWQ0e/VjPoXBMBntt2/09wzCoAKctap/T
         I+i+/YV0pTgLdJic2t3rFLO+9CrzW9KzeHhaAbuEh1D84XyNKODO4U30H16IesIQ0Xct
         MRKaVdEqJiT9/b5XvM8srTW3yvATeVryuwZgIN3LFFOlmyVTTwL8z6XfmeH9IEHIb0QS
         QIsF8yqq5sFbFletxX+cpIUwIv3wHEjVOt69+8ffZHkX7CiV09M1nwKZJOSI62NKQFVM
         uxfuUcNfS3b4wj9iWjfS/Gh8S8kDz9kDFlcqPGji1W2sxJMEE4zzS8YOSVpb8eqQyROE
         Uz9Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCUXIeOkFQ/ssJ95kOjuF7iWqyz+hPmjXi/wyX65ES1DljDeKlX7FodRz3Q71BlSCygf+EP3SbCcl1P7YR4=@vger.kernel.org
X-Gm-Message-State: AOJu0YyDbeG9xAGRDYhmchTqDYimNcmNJ2g8gqSBLqpqAsWDofPn1pXk
	r2v/8dUW5URu7avZXwQ37JKh4TmatM/7Q7mzSukxIlGdOwZS/hePZWobZVc3uQU=
X-Gm-Gg: ASbGncu86M3ovaX8ssRmg1DpYRBJrVubZG7uaIdZpX6uzdZ850TytjMKUrTpgXe9Uis
	ju0p65KAw2jRTMWdD1AzhbyG++1DsHzB5z7+7QK9ULJFcI8WsQG7TNwV2PwoK72kRen25xMy5nZ
	9ABv7GoVftibRoagUHPq0v9kR39zm6EzDKYPIlKMN+ZwGaX5XZLrU933uffmaJMduyisydScPgg
	687W+UU3c/8Ac+eajMsqhOQzRr02y/y9a4sJyIvMg1C8RvxkuztXuq7F4qtOKwAnYAl6Q6QPNQl
	/rCzlLfe1p67Z/EN5g77ZFH5P5F1rjCo0Z6uZ66NhVcDbA==
X-Google-Smtp-Source: 
 AGHT+IGdjZLS03B/i/oyj0FVCH6ak251OqqszbVDTcp6yx3FeG6rdnMDu6XNNYRan/DL0Mz7Gi3+Xw==
X-Received: by 2002:a05:600c:1548:b0:43c:e2dd:98f3 with SMTP id
 5b1f17b1804b1-43d1ecff3d0mr77112145e9.21.1742097957130;
        Sat, 15 Mar 2025 21:05:57 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:4c::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d200fad59sm67783415e9.26.2025.03.15.21.05.56
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:56 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 11/25] rqspinlock: Protect waiters in trylock
 fallback from stalls
Date: Sat, 15 Mar 2025 21:05:27 -0700
Message-ID: <20250316040541.108729-12-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1825; h=from:subject;
 bh=6ga9NaPnm2qscV5e908A787vKJcwTZL0L+jNyYJdaTA=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cd+lc2mT7eAYTtSCeru7PXMbtZTLl1rVKe2Na
 zqq61WeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RymCDEA
 C4K937bgxXiBFwWg7wVsj1Ouwcy1m1SvtfIZLVr3xtAqcbiYKXGG0i5pXm3XBUM2mgmI9CropTshbQ
 dQWs7kFH1UBO0zy35y69RMSbrW+XYvOIUv+szj8w1OnAXfbt5cxjZY2oWdwBfSiPYvDI8qqWAJq92x
 Eu9XxhjsiU3DKxEIhABz7oovXElnhcqiaJX0bMqx1PJqHQEUx+v8FQX4j/K/bNuVyKsahTVF0QLweA
 VdrXkytJDmq29EvwZiID/KwMdFSAlK8imRvbwx38oDpj5kOvmKd2YPf+9yf+Q1X6A2Om6Z42ulZ+ju
 44r5dNM2UD3qahjawRGUazaB+18OIHX1yD++NqUAjsQFympNzvBFjjeeC8ffwxmNJkaXt7zT43YAqL
 /uOkIADmjS54logpdlm10XVwYzPrtEOTELnZknQ8zGIG8h4on+rvZfgi++0tV7gs78T0qEEknIsIYB
 0uXO7Y1CGP+IevSyeQ3EJvHV2IYztylGYEHyA9GKqpmo5nBFslUK7B3Y9WsiuQJFs3DzzTDWNjIOsc
 8Y37rSMW+XsAoDPAbL+h6fkt5rkhyDfpkpnfY7TEZ9vp3KcnIvAS4dh/o691vcK4ZXIZx+LGjvb7WN
 VxSTT/cmkbB6g757jZk6EG/qXsjkzmmz6wFs7p7bSxl0HNH1488vWapQ0xWw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

When we run out of maximum rqnodes, the original queued spin lock slow
path falls back to a try lock. In such a case, we are again susceptible
to stalls in case the lock owner fails to make progress. We use the
timeout as a fallback to break out of this loop and return to the
caller. This is a fallback for an extreme edge case, when on the same
CPU we run out of all 4 qnodes. When could this happen? We are in slow
path in task context, we get interrupted by an IRQ, which while in the
slow path gets interrupted by an NMI, whcih in the slow path gets
another nested NMI, which enters the slow path. All of the interruptions
happen after node->count++.

We use RES_DEF_TIMEOUT as our spinning duration, but in the case of this
fallback, no fairness is guaranteed, so the duration may be too small
for contended cases, as the waiting time is not bounded. Since this is
an extreme corner case, let's just prefer timing out instead of
attempting to spin for longer.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 65c2b41d8937..361d452f027c 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -275,8 +275,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
-		while (!queued_spin_trylock(lock))
+		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
+		while (!queued_spin_trylock(lock)) {
+			if (RES_CHECK_TIMEOUT(ts, ret)) {
+				lockevent_inc(rqspinlock_lock_timeout);
+				break;
+			}
 			cpu_relax();
+		}
 		goto release;
 	}
 

From patchwork Sun Mar 16 04:05:28 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018285
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E14618FDC6;
	Sun, 16 Mar 2025 04:06:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097963; cv=none;
 b=E4kXIcLQsFZWwEprvZETuDa4/MfMlQoCdcWgNnwnsW5xralsuvAyfm9N0HD0yg870t575f7Sr0KexOyo/fQUpZjd7jcvqznZhM+uSM9rXz6W7Lu6T7MH1DQFJz+UByAw+8KrYszqo9XvIf0jnUqp+snUI4rhg0B9fkPyQj13nCo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097963; c=relaxed/simple;
	bh=R8rK3hAtLu0/qV+H7D4vZfaR/OW4OI0d017EzOpkC74=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=KcDmIBhtnRvRno2wC7G613kA2S94TlY4zQv2cvnBZpicI5mMZbHlCGGyM1LUBgIFC/WBzvLl8bZS3IijQ4iuPmgZvYG4ASVoeYy634wiv6dLqZL+uu1H7GQafKGGhZuYQ7YPGVnJo0t4ytmf/rztcULJtUT3SKhNSYbLsh6nDDI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=PyCJ34Al; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="PyCJ34Al"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43d04ea9d9aso3923265e9.3;
        Sat, 15 Mar 2025 21:06:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097959; x=1742702759;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Trt2PZaoDvW1qF5TKGphGKbeL9d09FYulI4YnHalG2E=;
        b=PyCJ34AlW5TiK4gBQL1GWBt1zi60xEniOCiepyskACjan6ePRoBoTYsSxn6bOkOvhN
         8SrSdTSKg5gQZATLCuAf4pMs8E4dQ+evFPUx1DZmYNAbi0a5GvHn+mRhFo4oeKJ8eoXr
         PJdOL8zFcAjpW64vRbcya9EL4S3/eDGXzDKp3Jc6QShvdolsUsPnqL3rkAvB3OzLT2/J
         V55J+4VRVQ1tVAqx1s7txAdTKugxuBJO/S1GOdrT68Rv8SUKayV22SH/jeGfwxEvUqGI
         AdCrvOFaFQxsCzHRZUn/OicOuNHiyzfhqMralm6XYgLhIwSIHg22QZVvmiE+dH0cdutI
         OGdw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097959; x=1742702759;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Trt2PZaoDvW1qF5TKGphGKbeL9d09FYulI4YnHalG2E=;
        b=TpReH+yKpCj2i8A+5idXXQClbUyBvxPtfV9oy9sGfkretvBGktKoQgupH33BAa20tB
         c/RClu+0QDRlOWH/TIWVJUadZd+s/QvNyvGZb9uWbPbZUJqAY0fnzm8LjSkSfWR9XhCL
         qiMKZgwrhws9RcjlqRpSaHwcjMEp1yGSaCpHA7A/UzJRP478H4Xr9RA7PGOb5mk5Cuyh
         TEOKe+yb4lOgU8z3N2X9czgm4BWJ/7cmQbzg2WVN2rLNlUxxFwPgNLFzWLhgnW86fsDH
         sgz10h5al/zQ9co6HYs2jkUN4iMGma0YzHihpwviOgnwzLxQ+7rDDWnuklNmjfU1VhQO
         h2HQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCX9oZSR6Qfyx/+MvtCCYLXl1GUwG744ASs5bBtrRSa92LPqx9BV5C70nY//N89IwGJ9B7u1DWyTAzQCl4Y=@vger.kernel.org
X-Gm-Message-State: AOJu0YyP91T1FI8Md+eGFXBCSX8pAAGMhM5mouNu2XnSAf98GNTwwUAj
	Y5WNHDsqnTkkBNNZomfWrWP4QvMGCh6HXuHV2XtsLcxd5bJiXWa6DpkFVSgAY6A=
X-Gm-Gg: ASbGncssYmDGMUzhL98nHABpJDkaRM8YGn+6yZ5ST7n5UJjjWramMCJoFMDkg/vzcM7
	p+dZlt3qX5fqAgD48AoF1L93jVC0B4xABS79/B63snFvmV5po0KDoWM4dlseTC3jZznDwaSbC5A
	uDEaaytavz4pX+/ZEk/T6usAcKkmnJugCcLDsCY55TBehQED+8L5BkfJU6L4BegqB4bMIA67JKn
	MCV+/lqCvPvkG6HsgpCirEbjP73cNXFZnrDIkuw4ZpC5pfb04VxrFldEAtBIplyNhqKhzoNufH6
	kd9QI9A/yzUGl+KgTYBn0AY2QUq7FQP+j5U=
X-Google-Smtp-Source: 
 AGHT+IELCRC7Npj53fsWLLPGTTMW2S8mtuRNJw4+tQrj7Ej7EcmIuAdgGibINSTvjMReh3aiCtM8KA==
X-Received: by 2002:a5d:6da1:0:b0:391:4389:f363 with SMTP id
 ffacd0b85a97d-3971ee44e17mr9518660f8f.21.1742097958853;
        Sat, 15 Mar 2025 21:05:58 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395c8975b90sm11082741f8f.53.2025.03.15.21.05.57
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:57 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 12/25] rqspinlock: Add deadlock detection and
 recovery
Date: Sat, 15 Mar 2025 21:05:28 -0700
Message-ID: <20250316040541.108729-13-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=16807; h=from:subject;
 bh=R8rK3hAtLu0/qV+H7D4vZfaR/OW4OI0d017EzOpkC74=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3d3cpOPE+JD/6A6QOyXue338yygczvobcBs9p0
 t9lKfEyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RyulYD/
 oCAFAcDJwUGqxq0HokrhAu3NOXVZPUQzrD1Fd/bAV3pkN73meLLVeovkYVI0ve7Kjx2/hcMJyYvJbr
 jkpwrffoXO8Xr768OKNR4mBc6jrytY5czg/+e0BKRXct/n5IFosZ8+tHgjaY9j5RcQcq7eRwUQnxnn
 Q7S84XghkBroVcQtWuJM98v/KONhHgJCpGMQAAUU32SNWJkc28V4HIWt9hoQsoIi6CJTX8Uc88a/Qf
 FVOQlNaBXGiYKMb9K3kOjA41VLmhe565kuFeGvXUnJi+Cn9xac01lMqZRoHRIVKeHgz/mo0i+F7abp
 eofFVG0gbpmw0Gb+GlDHtD6UztvzXQW6TUTZqpfs9TXc5ti4+Nv12bATZP9jhNTd3dmKxw+z7JScl4
 BMgrqxtE/WrA7jZVALBqwJ2kPbQolbz8sbNHH5z0796JP71TrPx7mfKFi2Us1I0Nv+hUkuwarjUzIt
 VJhlvGVJXd9BRHbCHCSIq+sh7i5y3bcjHmfFhim9zewk0L8xWupkFUohlI78gsDmgMlQl7f9w8VVza
 8hOQDu9DZMx4wjyXZL5LKr4XOZiXlowtWpcR++9jgQoihCpQXlPZJouhUpu2Gt9rOQdgIMp/8Ws4U2
 ep59HLE49pV/ul4ULzQIO5O665ynyxzgE/f/3H6pSPpftXxAxajnOTA5uYRw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

While the timeout logic provides guarantees for the waiter's forward
progress, the time until a stalling waiter unblocks can still be long.
The default timeout of 1/4 sec can be excessively long for some use
cases.  Additionally, custom timeouts may exacerbate recovery time.

Introduce logic to detect common cases of deadlocks and perform quicker
recovery. This is done by dividing the time from entry into the locking
slow path until the timeout into intervals of 1 ms. Then, after each
interval elapses, deadlock detection is performed, while also polling
the lock word to ensure we can quickly break out of the detection logic
and proceed with lock acquisition.

A 'held_locks' table is maintained per-CPU where the entry at the bottom
denotes a lock being waited for or already taken. Entries coming before
it denote locks that are already held. The current CPU's table can thus
be looked at to detect AA deadlocks. The tables from other CPUs can be
looked at to discover ABBA situations. Finally, when a matching entry
for the lock being taken on the current CPU is found on some other CPU,
a deadlock situation is detected. This function can take a long time,
therefore the lock word is constantly polled in each loop iteration to
ensure we can preempt detection and proceed with lock acquisition, using
the is_lock_released check.

We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
deadlock checks immediately to perform faster recovery.

Note: Extending lock word size by 4 bytes to record owner CPU can allow
faster detection for ABBA. It is typically the owner which participates
in a ABBA situation. However, to keep compatibility with existing lock
words in the kernel (struct qspinlock), and given deadlocks are a rare
event triggered by bugs, we choose to favor compatibility over faster
detection.

The release_held_lock_entry function requires an smp_wmb, while the
release store on unlock will provide the necessary ordering for us. Add
comments to document the subtleties of why this is correct. It is
possible for stores to be reordered still, but in the context of the
deadlock detection algorithm, a release barrier is sufficient and
needn't be stronger for unlock's case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 100 +++++++++++++++++
 kernel/bpf/rqspinlock.c          | 187 ++++++++++++++++++++++++++++---
 2 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 9bd11cb7acd6..34c3dcb4299e 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -11,6 +11,7 @@
 
 #include <linux/types.h>
 #include <vdso/time64.h>
+#include <linux/percpu.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
@@ -22,4 +23,103 @@ extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4)
 
+/*
+ * Choose 31 as it makes rqspinlock_held cacheline-aligned.
+ */
+#define RES_NR_HELD 31
+
+struct rqspinlock_held {
+	int cnt;
+	void *locks[RES_NR_HELD];
+};
+
+DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static __always_inline void grab_held_lock_entry(void *lock)
+{
+	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
+
+	if (unlikely(cnt > RES_NR_HELD)) {
+		/* Still keep the inc so we decrement later. */
+		return;
+	}
+
+	/*
+	 * Implied compiler barrier in per-CPU operations; otherwise we can have
+	 * the compiler reorder inc with write to table, allowing interrupts to
+	 * overwrite and erase our write to the table (as on interrupt exit it
+	 * will be reset to NULL).
+	 *
+	 * It is fine for cnt inc to be reordered wrt remote readers though,
+	 * they won't observe our entry until the cnt update is visible, that's
+	 * all.
+	 */
+	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
+}
+
+/*
+ * We simply don't support out-of-order unlocks, and keep the logic simple here.
+ * The verifier prevents BPF programs from unlocking out-of-order, and the same
+ * holds for in-kernel users.
+ *
+ * It is possible to run into misdetection scenarios of AA deadlocks on the same
+ * CPU, and missed ABBA deadlocks on remote CPUs if this function pops entries
+ * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
+ * logic to preserve right entries in the table would be to walk the array of
+ * held locks and swap and clear out-of-order entries, but that's too
+ * complicated and we don't have a compelling use case for out of order unlocking.
+ */
+static __always_inline void release_held_lock_entry(void)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto dec;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+dec:
+	/*
+	 * Reordering of clearing above with inc and its write in
+	 * grab_held_lock_entry that came before us (in same acquisition
+	 * attempt) is ok, we either see a valid entry or NULL when it's
+	 * visible.
+	 *
+	 * But this helper is invoked when we unwind upon failing to acquire the
+	 * lock. Unlike the unlock path which constitutes a release store after
+	 * we clear the entry, we need to emit a write barrier here. Otherwise,
+	 * we may have a situation as follows:
+	 *
+	 * <error> for lock B
+	 * release_held_lock_entry
+	 *
+	 * try_cmpxchg_acquire for lock A
+	 * grab_held_lock_entry
+	 *
+	 * Lack of any ordering means reordering may occur such that dec, inc
+	 * are done before entry is overwritten. This permits a remote lock
+	 * holder of lock B (which this CPU failed to acquire) to now observe it
+	 * as being attempted on this CPU, and may lead to misdetection (if this
+	 * CPU holds a lock it is attempting to acquire, leading to false ABBA
+	 * diagnosis).
+	 *
+	 * In case of unlock, we will always do a release on the lock word after
+	 * releasing the entry, ensuring that other CPUs cannot hold the lock
+	 * (and make conclusions about deadlocks) until the entry has been
+	 * cleared on the local CPU, preventing any anomalies. Reordering is
+	 * still possible there, but a remote CPU cannot observe a lock in our
+	 * table which it is already holding, since visibility entails our
+	 * release store for the said lock has not retired.
+	 *
+	 * In theory we don't have a problem if the dec and WRITE_ONCE above get
+	 * reordered with each other, we either notice an empty NULL entry on
+	 * top (if dec succeeds WRITE_ONCE), or a potentially stale entry which
+	 * cannot be observed (if dec precedes WRITE_ONCE).
+	 *
+	 * Emit the write barrier _before_ the dec, this permits dec-inc
+	 * reordering but that is harmless as we'd have new entry set to NULL
+	 * already, i.e. they cannot precede the NULL store above.
+	 */
+	smp_wmb();
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 361d452f027c..bddbcc47d38f 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -31,6 +31,7 @@
  */
 #include "../locking/qspinlock.h"
 #include "../locking/lock_events.h"
+#include "rqspinlock.h"
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -74,16 +75,147 @@
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
+	u64 cur;
 	u16 spin;
 };
 
 #define RES_TIMEOUT_VAL	2
 
-static noinline int check_timeout(struct rqspinlock_timeout *ts)
+DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+EXPORT_SYMBOL_GPL(rqspinlock_held_locks);
+
+static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts)
+{
+	if (!(atomic_read_acquire(&lock->val) & (mask)))
+		return true;
+	return false;
+}
+
+static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask,
+				      struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int cnt = min(RES_NR_HELD, rqh->cnt);
+
+	/*
+	 * Return an error if we hold the lock we are attempting to acquire.
+	 * We'll iterate over max 32 locks; no need to do is_lock_released.
+	 */
+	for (int i = 0; i < cnt - 1; i++) {
+		if (rqh->locks[i] == lock)
+			return -EDEADLK;
+	}
+	return 0;
+}
+
+/*
+ * This focuses on the most common case of ABBA deadlocks (or ABBA involving
+ * more locks, which reduce to ABBA). This is not exhaustive, and we rely on
+ * timeouts as the final line of defense.
+ */
+static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
+					struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int rqh_cnt = min(RES_NR_HELD, rqh->cnt);
+	void *remote_lock;
+	int cpu;
+
+	/*
+	 * Find the CPU holding the lock that we want to acquire. If there is a
+	 * deadlock scenario, we will read a stable set on the remote CPU and
+	 * find the target. This would be a constant time operation instead of
+	 * O(NR_CPUS) if we could determine the owning CPU from a lock value, but
+	 * that requires increasing the size of the lock word.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu);
+		int real_cnt = READ_ONCE(rqh_cpu->cnt);
+		int cnt = min(RES_NR_HELD, real_cnt);
+
+		/*
+		 * Let's ensure to break out of this loop if the lock is available for
+		 * us to potentially acquire.
+		 */
+		if (is_lock_released(lock, mask, ts))
+			return 0;
+
+		/*
+		 * Skip ourselves, and CPUs whose count is less than 2, as they need at
+		 * least one held lock and one acquisition attempt (reflected as top
+		 * most entry) to participate in an ABBA deadlock.
+		 *
+		 * If cnt is more than RES_NR_HELD, it means the current lock being
+		 * acquired won't appear in the table, and other locks in the table are
+		 * already held, so we can't determine ABBA.
+		 */
+		if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD)
+			continue;
+
+		/*
+		 * Obtain the entry at the top, this corresponds to the lock the
+		 * remote CPU is attempting to acquire in a deadlock situation,
+		 * and would be one of the locks we hold on the current CPU.
+		 */
+		remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]);
+		/*
+		 * If it is NULL, we've raced and cannot determine a deadlock
+		 * conclusively, skip this CPU.
+		 */
+		if (!remote_lock)
+			continue;
+		/*
+		 * Find if the lock we're attempting to acquire is held by this CPU.
+		 * Don't consider the topmost entry, as that must be the latest lock
+		 * being held or acquired.  For a deadlock, the target CPU must also
+		 * attempt to acquire a lock we hold, so for this search only 'cnt - 1'
+		 * entries are important.
+		 */
+		for (int i = 0; i < cnt - 1; i++) {
+			if (READ_ONCE(rqh_cpu->locks[i]) != lock)
+				continue;
+			/*
+			 * We found our lock as held on the remote CPU.  Is the
+			 * acquisition attempt on the remote CPU for a lock held
+			 * by us?  If so, we have a deadlock situation, and need
+			 * to recover.
+			 */
+			for (int i = 0; i < rqh_cnt - 1; i++) {
+				if (rqh->locks[i] == remote_lock)
+					return -EDEADLK;
+			}
+			/*
+			 * Inconclusive; retry again later.
+			 */
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
+				   struct rqspinlock_timeout *ts)
+{
+	int ret;
+
+	ret = check_deadlock_AA(lock, mask, ts);
+	if (ret)
+		return ret;
+	ret = check_deadlock_ABBA(lock, mask, ts);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
+				  struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
+	u64 prev = ts->cur;
 
 	if (!ts->timeout_end) {
+		ts->cur = time;
 		ts->timeout_end = time + ts->duration;
 		return 0;
 	}
@@ -91,6 +223,15 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	if (time > ts->timeout_end)
 		return -ETIMEDOUT;
 
+	/*
+	 * A millisecond interval passed from last time? Trigger deadlock
+	 * checks.
+	 */
+	if (prev + NSEC_PER_MSEC < time) {
+		ts->cur = time;
+		return check_deadlock(lock, mask, ts);
+	}
+
 	return 0;
 }
 
@@ -99,21 +240,22 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
  * as the macro does internal amortization for us.
  */
 #ifndef res_smp_cond_load_acquire
-#define RES_CHECK_TIMEOUT(ts, ret)                    \
-	({                                            \
-		if (!(ts).spin++)                     \
-			(ret) = check_timeout(&(ts)); \
-		(ret);                                \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)                              \
+	({                                                            \
+		if (!(ts).spin++)                                     \
+			(ret) = check_timeout((lock), (mask), &(ts)); \
+		(ret);                                                \
 	})
 #else
-#define RES_CHECK_TIMEOUT(ts, ret, mask)	      \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)			      \
 	({ (ret) = check_timeout(&(ts)); })
 #endif
 
 /*
  * Initialize the 'spin' member.
+ * Set spin member to 0 to trigger AA/ABBA checks immediately.
  */
-#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; })
+#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; })
 
 /*
  * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
@@ -142,6 +284,7 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
  *
  * Return:
  * * 0		- Lock was acquired successfully.
+ * * -EDEADLK	- Lock acquisition failed because of AA/ABBA deadlock.
  * * -ETIMEDOUT - Lock acquisition failed because of timeout.
  *
  * (queue tail, pending bit, lock value)
@@ -212,6 +355,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		goto queue;
 	}
 
+	/*
+	 * Grab an entry in the held locks array, to enable deadlock detection.
+	 */
+	grab_held_lock_entry(lock);
+
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
@@ -225,7 +373,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
-		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -240,7 +388,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		 */
 		clear_pending(lock);
 		lockevent_inc(rqspinlock_lock_timeout);
-		return ret;
+		goto err_release_entry;
 	}
 
 	/*
@@ -258,6 +406,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
+	/*
+	 * Grab deadlock detection entry for the queue path.
+	 */
+	grab_held_lock_entry(lock);
+
 	node = this_cpu_ptr(&rqnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -277,9 +430,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		lockevent_inc(lock_no_node);
 		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
 		while (!queued_spin_trylock(lock)) {
-			if (RES_CHECK_TIMEOUT(ts, ret)) {
+			if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
 				lockevent_inc(rqspinlock_lock_timeout);
-				break;
+				goto err_release_node;
 			}
 			cpu_relax();
 		}
@@ -375,7 +528,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
 	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-					   RES_CHECK_TIMEOUT(ts, ret));
+					   RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {
@@ -408,7 +561,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
 		}
 		lockevent_inc(rqspinlock_lock_timeout);
-		goto release;
+		goto err_release_node;
 	}
 
 	/*
@@ -455,5 +608,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
 	return ret;
+err_release_node:
+	trace_contention_end(lock, ret);
+	__this_cpu_dec(rqnodes[0].mcs.count);
+err_release_entry:
+	release_held_lock_entry();
+	return ret;
 }
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);

From patchwork Sun Mar 16 04:05:29 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018286
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A86F1922DD;
	Sun, 16 Mar 2025 04:06:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097964; cv=none;
 b=rmKsSciqp9ORnl0FjhBu6+Dufe5YU612YZQuv26PxTNusMF3UVl9cC0Eq5YKvcvoDUxw4ZBRsQjR76i8pOeuZUK04GV4h5KlFq2QLoZC/IZEiBwyD3ZMFI4B/yVc0OCPozPXdN4mjCm69a+ThLd758uCPzWwGYgrhqA4dzCxV5w=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097964; c=relaxed/simple;
	bh=6EUILfgiZohkAPsyvtNzcRk40CzXPZuYHAK+ukHSlzU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=QuPjbZQLRpJXi4k9sJ4/hUyNV146WfcY11QsxK2Fs8OAsu14wHNmZQcqVq3Lhn6gQ7QprpCyAaANTNPkIfhrn+rfNQBV3b/T47QZDnaMnUSXPtVGar0lIwE40+FBoXg4TX6jIVYhL4T0fkZBAlM34jmpJ3l0bngyVpufGs9MjMg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=TULKFKkz; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="TULKFKkz"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-3913fdd003bso1633099f8f.1;
        Sat, 15 Mar 2025 21:06:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097960; x=1742702760;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ClBpfryF0l26LJucqaea+hnSd7y5tppZSgQBloKDt3A=;
        b=TULKFKkzejGgC+4zTarO0YgDxwTdK4PPPpAuNOsz4zNq1j1J1828R1yI3Dge+Cbk3l
         b35BpaRJi0HOrVb2CP029m2EujiwaQ9rrqHemhKSNoAzmzVij/0iHZnxzXCGqCdbaYuj
         z/fgoPa6f2+u8atkNJSIt50Zmlx4lEjW9sMbgltsmqr3heldJzImmkxgnddJ5ELBBrcR
         zjAkW1KJXhYH95VaSUoxuwGHGJp/9GX7agwPTzURfD0yRsZNvtdu9Mgi3vJqMAQIj2Y5
         F28wnuIL2cVz+9ZUEqJgQw3EM/P9YtcSP2K70AphfACR8FFxH7SWBnl4rrBOn/WskwHS
         dBgA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097960; x=1742702760;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ClBpfryF0l26LJucqaea+hnSd7y5tppZSgQBloKDt3A=;
        b=eCL8Z4rJLOhhT7tPs5kPw7VxqLsyVIAiaalh0TY5+Roymddy4518AFMSP36u9/L1fZ
         p+z2fMSgvJVtQbrZ9x9zD0iuMQueoADgxXtDRIChudrWhJ+Mwb+WwmoKLsBSzk9j1IqL
         n6qZPXZoyyotPbyjmJtdoCE1zrgMiTj1G4846BmsVwubVJpwPdkTI6J4WVoANrTJThJq
         mHc+Lzghw9YP/850oF5mha0QQBMTWsnsuiMg7Igf6JmCk97LNwq5JXM+AOkKYXFzJdgf
         sbIAjwGhaQVey00qaFiDG2wSg6rgiS0R9cw7FUcGWiLCbeFLCqxZxoYnaqVdfm4wcnya
         66rg==
X-Forwarded-Encrypted: i=1;
 AJvYcCX8r2hfBCE5EJ62EjVcRwTbIOz+EWf4mHsIuROfmBslEQ5YcYj6cPrdieIfJd3t8PJK/KB8JC88u+nSxB0=@vger.kernel.org
X-Gm-Message-State: AOJu0YzcbhHc5X9zz8inS5GyOnskh2SrPkHN74gMoujyYVCcUS1UFthj
	uIkv73BhGbhWoR+IVFsoea5Oxdgcc9l9KoS9mVE8Yz5f0gr3PWY+NsqkpN/mYlo=
X-Gm-Gg: ASbGncuy3SRjIFfXe/YdVk+PkzwTgxbP1L2j3sa7cNitSLGCmslFYWVDEVG4Ex3oXCB
	yzACpWtRzQzXiDdWvx3oyNnNv9KNdnuWD2h3rescYBKxaP3faRCipRZQF8kKcLiwKFRc8pzFCpm
	hMW2eL9GCe5uwOsUnzKS958NTZodBpZeQ618+w54SL9CAZgzZQK2iUdQ8cWvi4RToZodCWvtnXi
	ZmU6ID42dZ6Ieo2E14cwGU82/s724chwAKjKMeDJY1uhmwPENir6kuFM9LQetfMMH5bWievtSbv
	SvR4qtmn5QfUqUp2HS8YhWqIyow34Isdp1g=
X-Google-Smtp-Source: 
 AGHT+IH/e6sr7Q8NtaxNCMMeZqQ+z9X8uI7I1YSl5ZwD64++FRmjeUrFOkqAexRXD/7ze8hpT1OmbA==
X-Received: by 2002:a05:6000:1562:b0:38d:dc03:a3d6 with SMTP id
 ffacd0b85a97d-395b70b7668mr13114610f8f.4.1742097960004;
        Sat, 15 Mar 2025 21:06:00 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:4f::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fdda30esm68369765e9.5.2025.03.15.21.05.59
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:05:59 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 13/25] rqspinlock: Add a test-and-set fallback
Date: Sat, 15 Mar 2025 21:05:29 -0700
Message-ID: <20250316040541.108729-14-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4142; h=from:subject;
 bh=6EUILfgiZohkAPsyvtNzcRk40CzXPZuYHAK+ukHSlzU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dWZVH4cGT4jygyYaRMgYBKoSMntnk4oFzqRKP
 +TJ3LmiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RyhmDEA
 C1zcUyxSUagfAmjyxuovyTwYJTtWqoL5BpkcoMeatgpoXvSFS4tL7BP+3f57t/dG51OVYvBhT2RXN3
 nhHuP1IuXSpAdZG+q3swIHEVfIED9SS3PCn0geldDUhkAzxMkpbZqWPwJhh4IcqZB+IWBsN7BZYtW+
 eiq/JKntCpyChduNnwrkkDRMWiws3F5N+JllmYp8XC54/J48uYmiFyJaYYEUR7ONy+IL/f21dHKQrT
 8xCYRQblvr2tXiLe/rj9uJrq5h9J1j+Qw16WsNZ8WDBUWx52xoX1xV1UdOS/8vQX9HCEOJ8KDbo/db
 spk5jmSQFHBWJjYK39tihe0CXejyi/J+FpiRzMRNopnL44+efFbw10U5IgaVjMEP9LkozIVmnowmrI
 jCZ5a0OfO+eZf6xAeHlknuB9N5Zl0SNEZhWh3ayRtPeF/OpHs9sZrE+CaTsjnHnh2g8vIQt/VzzuU9
 bJyTo/zGSOiYSFmfYQPN3JfpgR58GeJTqb15WlpNy21ivZWR9dptXHGTECX7xeEoJ7ns+MuNnYAC9e
 miR4qDevVacWqmiWfTKfc8CbfAmZI//+1NKlJgfhOomrfHwGNvSEeKxTnMs0pANCDx/wKI/kHLuWqC
 rR72s0G1l2FIBNETNQNhbftHp/sZqx1oEyYMlCsrRgBNyzGuhAHxNLnCg7Ww==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Include a test-and-set fallback when queued spinlock support is not
available. Introduce a rqspinlock type to act as a fallback when
qspinlock support is absent.

Include ifdef guards to ensure the slow path in this file is only
compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add
further logic to ensure fallback to the test-and-set implementation
when queued spinlock support is unavailable on an architecture.

Unlike other waiting loops in rqspinlock code, the one for test-and-set
has no theoretical upper bound under contention, therefore we need a
longer timeout than usual. Bump it up to a second in this case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 17 ++++++++++++
 kernel/bpf/rqspinlock.c          | 46 ++++++++++++++++++++++++++++++--
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 34c3dcb4299e..12f72c4a97cd 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -12,11 +12,28 @@
 #include <linux/types.h>
 #include <vdso/time64.h>
 #include <linux/percpu.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#endif
+
+struct rqspinlock {
+	union {
+		atomic_t val;
+		u32 locked;
+	};
+};
 
 struct qspinlock;
+#ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
+#else
+typedef struct rqspinlock rqspinlock_t;
+#endif
 
+extern int resilient_tas_spin_lock(rqspinlock_t *lock);
+#ifdef CONFIG_QUEUED_SPINLOCKS
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+#endif
 
 /*
  * Default timeout for waiting loops is 0.25 seconds
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index bddbcc47d38f..714dfab5caa8 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -21,7 +21,9 @@
 #include <linux/mutex.h>
 #include <linux/prefetch.h>
 #include <asm/byteorder.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include <asm/qspinlock.h>
+#endif
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
 #include <linux/timekeeping.h>
@@ -29,9 +31,12 @@
 /*
  * Include queued spinlock definitions and statistics code
  */
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include "../locking/qspinlock.h"
 #include "../locking/lock_events.h"
 #include "rqspinlock.h"
+#include "../locking/mcs_spinlock.h"
+#endif
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -70,8 +75,6 @@
  *
  */
 
-#include "../locking/mcs_spinlock.h"
-
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
@@ -263,6 +266,43 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
  */
 #define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; })
 
+/*
+ * Provide a test-and-set fallback for cases when queued spin lock support is
+ * absent from the architecture.
+ */
+int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock)
+{
+	struct rqspinlock_timeout ts;
+	int val, ret = 0;
+
+	RES_INIT_TIMEOUT(ts);
+	grab_held_lock_entry(lock);
+
+	/*
+	 * Since the waiting loop's time is dependent on the amount of
+	 * contention, a short timeout unlike rqspinlock waiting loops
+	 * isn't enough. Choose a second as the timeout value.
+	 */
+	RES_RESET_TIMEOUT(ts, NSEC_PER_SEC);
+retry:
+	val = atomic_read(&lock->val);
+
+	if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) {
+		if (RES_CHECK_TIMEOUT(ts, ret, ~0u))
+			goto out;
+		cpu_relax();
+		goto retry;
+	}
+
+	return 0;
+out:
+	release_held_lock_entry();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(resilient_tas_spin_lock);
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -616,3 +656,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */

From patchwork Sun Mar 16 04:05:30 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018287
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D84BE192B9D;
	Sun, 16 Mar 2025 04:06:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097965; cv=none;
 b=NvXwkWXG1eMt8N1A3Nq3oC4EJrf21ColVRmUm3AiINzZOJ9uc86T3baPoLYPl/7qGyHESi7XP2mHtil1VVRIX+013zQuhykSoN62+6hpIdlkmLe4X7L/T7s87x1HLrIdbqcu92n9QhgFmUZcG6i06WN5xrms8cU1RYCP5UiFur0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097965; c=relaxed/simple;
	bh=9lJX2DtyRTnUplpFVsYE5ziAxwiiosjb1D9cDEXkG+8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=OkeO0s9cBw+SNFP7xKwqVDoyeJGi/1T7z141kPyHYNwiYRdfRe/LK505vtaL0rh5y24wJAltmU5THSMtlBtJ3XaERE1YBtJFT1Pnu4ZuzVdyasCSuRUWwS6KwY2uZ6L8GxniMb59vskItF6w/cjqV2WaE/8KOQkn0QJGWVJwR4I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=fDnpEO6k; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="fDnpEO6k"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43cf58eea0fso4711655e9.0;
        Sat, 15 Mar 2025 21:06:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097962; x=1742702762;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=2SJ4rKlnRTNLC7EmgTg9GusoyyGplCB9SKvBAwvhMho=;
        b=fDnpEO6k1iJmQ7l/pNxutdoDQE8I+y2ZIbxP0kYNVcIC79XmeKm1CnUWRItPSk5K1z
         oMXr173LZDTFaDgFbaz+r9rQ2WQc1gzXqFJ+Kv+nK8fLFh49u1xTc9S1IaT2XMN24BZg
         kwgJNg+oEOZKFbbh9/wjvsL1AS6rnwq/I8JmR2sxvoYIt3XuQRL83zJ71XHbVdZPmmlL
         R3bUmzrMBthOYHoX8ZcaW18bM31+Edxwv6fVwUHtBfAFTqSnzAArio9XIsCXrdvdMSUK
         nQJjlFkaeI9Y6jggnFnh7MtqN1NisLqOJemUGcxjP+Xx5GtTdrPoenQIz5iFqwO6Sq17
         jPkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097962; x=1742702762;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=2SJ4rKlnRTNLC7EmgTg9GusoyyGplCB9SKvBAwvhMho=;
        b=seDKnX9ZJp1lOibyEsuY29OoPxw3J+84PqQatzKlrss89Hx/01sdE/NSlJXHhFmbvb
         sVAbB76hdgxwJb0aN4LGUzfK9slzTSWhVzrBzZZ6TQ9eJmcDu7PQPxGEOY06XYpt4PVP
         y1d7gjD9AtoEFLkvQPlNU8Kxh2qThcOeh/TXWJOE5QKgIrNOICtDyuGWIk0hGea2zB9+
         BY7YCUFdLqb4q/DhtlgtexxldgG7OE7B3a40Uvi8TtaFf7wJ9trLLJ6QNgRo67aLPuFS
         y2F3gDH2aIu/mOkfdH3u6gbAdNlgJ6wzhIOOtCamyWSdr5aDZ4zuHYbao5bxNQPejwHC
         1uUg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWLMGSGv3V7OzVGc0V1K62Rh/1yMimJhIwWhvOMeV919+nj3UwlCnAgftlR1QLO0pqbrrWmeFgH0yBpWjo=@vger.kernel.org
X-Gm-Message-State: AOJu0YzuY0AQo2xS0NGeYIZb+pKH3JRzMtZu/bn1bSpvLu+oAI1xCm+s
	Qz4g3hzHKbUCpC9HOobp1ID7xRzrLY+nfE5eh24T69N/AEJiuTHVW/1XRlNupBo=
X-Gm-Gg: ASbGnct2xgV85g5BPVzBsm5FsiZhuRdtlD93L4dqXAybxQRcDJ3qjO0mkY8kieaNQ1c
	BoD40BKYoJlKLLYF47sZy7kheQlCXV1+VXaS8kGnl3IaAyoiiCL40HcYHqHW7h/3OS3nNYND0OP
	3V0ypHP6ytra5WdiO+9LSq+Wx8tf+kEhb7xWb31gDLgb2MEknKQTp9m3/a84QuWIJLkkjFKMG79
	pgBHpLMi4M0hSDu/rBbVMRfCe7s5UB+rvfqBuTvU1nklYtbx7D82pDmZ/c82RXvNrR+37x6WwP8
	dr/DV0jBfQ//q+iGoYsZ2ri9jtSvFQ006+8=
X-Google-Smtp-Source: 
 AGHT+IEG8+7fAz+0OnMkkxzP6+jI+ColvkT10MRmYk8Gri8AF6ZxFi7vPhA4DMmTFfNK0LsRQARGzA==
X-Received: by 2002:a05:600c:1d1a:b0:43b:cc3c:60bc with SMTP id
 5b1f17b1804b1-43d1ec87be2mr100575065e9.15.1742097961450;
        Sat, 15 Mar 2025 21:06:01 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:70::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fe609dasm66578265e9.28.2025.03.15.21.06.00
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:00 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 14/25] rqspinlock: Add basic support for
 CONFIG_PARAVIRT
Date: Sat, 15 Mar 2025 21:05:30 -0700
Message-ID: <20250316040541.108729-15-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3261; h=from:subject;
 bh=9lJX2DtyRTnUplpFVsYE5ziAxwiiosjb1D9cDEXkG+8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dEsMhP+74YqrqyvnIh88VKcNnAxL0cvJ1gkJa
 n7xGLWyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8Rynp7D/
 9jwO0LqsAd6i0H1M1SNcFjzNyEgzms+NwhZBtBuTdTvqAT3wwJj/F4EEK9EJQFykdau+00yKY/pgkR
 PEVdRrxlAgs2crbDzjqJYh+q9G8WNGP+ThpQ+Zt+aw/TVeYukHpS1pR9iEmD9srnDE2dwebiAGPgMM
 vTFWGqOETpp80HUL8s2G0XCRaH0zfVXr66xapYIetDVzQF3xUHsS3DDcFORGeinrUfxhoMZBRNkRpt
 xHq2rF0Oll2LPziNq1W4U5lK75qNdpx3oRkhj+J0GxrLo2VYmf1jZjs1Pu6Tjls7sPAuCKGc1sNfpS
 IDLbiiBhtgpaN8QlAzXc1TliiueghupnCo+aVENDvLEHG4QQhpeyGROKQPIJ0/cABmhk8YeMQ6eg1t
 NBNIZblT+x0GbPMC1SZ6cVNewgGcNju7EedsTRHf/SrbrIuDhXY0XZ2aZOzt8gKa4kk1obC3tFP+GE
 k1ddu1M9BmjHiqEhAX6NpIf1U2Rwsg3tljPLO06eQOH/TWhlonNlZYWV4s35e2UG0C3gGcj5Uk7p1E
 h5WyV089i3EAdv/Bmul3PVl64TmvbJSqqDNMSyPFFUS3n3IjFeEhF6GX8NEeNGPNzpYJAgQvnGibRB
 k7tvZ6agwzjq0rg4VW9szOQG84Ei1XQvpZiRUTuhGSeEo0cdLmuN9TNcntsA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

We ripped out PV and virtualization related bits from rqspinlock in an
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection. We can do this by simply depending on the
resilient_tas_spin_lock implementation from the previous patch.

We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/include/asm/rqspinlock.h | 33 +++++++++++++++++++++++++++++++
 include/asm-generic/rqspinlock.h  | 14 +++++++++++++
 kernel/bpf/rqspinlock.c           |  3 +++
 3 files changed, 50 insertions(+)
 create mode 100644 arch/x86/include/asm/rqspinlock.h

diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..24a885449ee6
--- /dev/null
+++ b/arch/x86/include/asm/rqspinlock.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RQSPINLOCK_H
+#define _ASM_X86_RQSPINLOCK_H
+
+#include <asm/paravirt.h>
+
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+
+#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+       return static_branch_likely(&virt_spin_lock_key);
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+typedef struct qspinlock rqspinlock_t;
+#else
+typedef struct rqspinlock rqspinlock_t;
+#endif
+extern int resilient_tas_spin_lock(rqspinlock_t *lock);
+
+#define resilient_virt_spin_lock resilient_virt_spin_lock
+static inline int resilient_virt_spin_lock(rqspinlock_t *lock)
+{
+	return resilient_tas_spin_lock(lock);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_X86_RQSPINLOCK_H */
diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 12f72c4a97cd..a837c6b6abd9 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock);
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 #endif
 
+#ifndef resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+	return false;
+}
+#endif
+
+#ifndef resilient_virt_spin_lock
+static __always_inline int resilient_virt_spin_lock(rqspinlock_t *lock)
+{
+	return 0;
+}
+#endif
+
 /*
  * Default timeout for waiting loops is 0.25 seconds
  */
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 714dfab5caa8..ed21ee010063 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -352,6 +352,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	if (resilient_virt_spin_lock_enabled())
+		return resilient_virt_spin_lock(lock);
+
 	RES_INIT_TIMEOUT(ts);
 
 	/*

From patchwork Sun Mar 16 04:05:31 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018288
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1058194A66;
	Sun, 16 Mar 2025 04:06:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097966; cv=none;
 b=CXQLVtGLUe+W8o1JnrK8pjP5EskwMMQG4BsVJabldJFtBYgKFqqcdS9CwfmJ3KnxoVZyqUmNbjt/vquuoNe3DtESdgUd/COi9d2YOlVj7G9tOC5bPcPo0oHsK+mlHt0pmQCB42l2mFwpLr9e4QMr2xhWK+fCJp1pWu+oYR9/ta8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097966; c=relaxed/simple;
	bh=mHarWyQwszN5GN3f/z3j87BhPIW1TR13hflSsKwjB3k=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ZXR7GpQ2unuq5ub3JI4WpXV8r/e6pQVRA8yXSOFT2b53BAHf7Z/2G8E+k9WBoqdoEcPijpy7P0DsJ+jGnKRs8CgbCYlsJureqqhUcpDHgeRRU4x3I8NpJjGIgrEl+ewRaDmAZSkhtNgrxxpBSO4c1hN5pc8RVhE7xmfZLTl6nQc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=IgBOUI2s; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="IgBOUI2s"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-3914bc3e01aso2176578f8f.2;
        Sat, 15 Mar 2025 21:06:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097963; x=1742702763;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=gUCAEDGBHb6/0D9rNwLR8XLz0vdrsmnhtv+wLj43xrw=;
        b=IgBOUI2sm3nzSyqViFa3zqyDPPZKARw4lrj0rLamrGhYlCjBrC2EaWCWz/NylsfasD
         8Xi0F5PDjl9MeOmU/4AGNLqzG1kHnLXiNutnWbOboOHuJsuclBHKlwSFhfV4wxSGxwsa
         2bmlkZ6eWOxEA62gZq3MzmyTtVB6yHOIljxOsG/ZNphloHlowmuVUfGgl2D0/f4bhHBb
         SxEiTF292fqb4pJ2jLFpTqEOBWRaYvsDGfhq2rLm1iE5TSzAcbp5b1lzuEDpG+angLYq
         6EzsMtlgUWvjd7H6a3elx3TC+n9EoWa+ksHTTZhR3k2bnrSRVVJjpTaD9nfCsZtU6EHg
         ngQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097963; x=1742702763;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=gUCAEDGBHb6/0D9rNwLR8XLz0vdrsmnhtv+wLj43xrw=;
        b=WgnH5aJ1H68kfTcmkXm9epA7JcNnMR49QH2ENidpvB6+l8VabcHOe8tt518Iv5IkX0
         y26I0/1dvhsQ1321p9tavQx1yx+I4CDF/4WZcCFexafRI44/25dbmDIfOxBlceNzfYWb
         SWEkXL9dFifjWSwomHpIcJreW74rJWUWKiz6jvAi04EhVbRNuOrUz22i8NvBACRCJu2p
         opbyHPZAH4EDSBQGWg8xV4M4Xk+8DmZVP8DENuZeS2TnWhUvss6SZjKt/fF6A4ktHWYh
         M42ehX9NaAKmNFm3E1rX+sCa62iG3ZT1/dgXU4qfPxFJWE0Llzz0H0kpSOZBDKPo6fNr
         WaQw==
X-Forwarded-Encrypted: i=1;
 AJvYcCW3Q4Vuej743ponsWg7LNqHiYHvGfMI596oO73Q9Vq479d0PeVjZQWzbK1ZMTw6qYWGcJSmbfvGoVxmWH0=@vger.kernel.org
X-Gm-Message-State: AOJu0YwDGhL3s1Cwoe7L4NIR4gNzJQ+qQVay+OTTy052Yl0e8CKguCYD
	B3Or7jBz+kvuPD5aPaPBTwLRsO1D4/lmsJQpHOeKPcLUwqhI1viv0nTK0dLKPjs=
X-Gm-Gg: ASbGncsd0GSIbDJYu3b6fKxT/mYauQbw7pYUqMEPYLiSzJ1UVM+OplBB8LKPBM7F33b
	xtEdj1hphSXaB1awXDy0mxD5X3Bi8u31rZxlPX2ZRrbJcZa8WuJpsOPPemd9l4U3m5mi+Xii/ii
	LZd+pkdCS2apB323nXTN29daKJQw6muqWCTptV8J3pyN8vkhcsXW/iwyCXIgm3ZjBGZCMfe4Y8e
	Iw+lOMUnlM5KvIBySZ4p/3Fiteh9vYAJvR/9apLkN4boFSg/Oqzr2fflzVsQR8UvBqyiSWwMoTu
	4VFK0L9cacOJPSWvQQpZaAhbsnU/6ZjTPgk=
X-Google-Smtp-Source: 
 AGHT+IH1QUcBVLxG3+dHXNVouxh8ZnGD/jGRFx3laH54pFhC0UFA/apB3TrLa3WvvXdMcVjhfYlapw==
X-Received: by 2002:a05:6000:1789:b0:391:29f:4f87 with SMTP id
 ffacd0b85a97d-3971fadef12mr8962187f8f.49.1742097962702;
        Sat, 15 Mar 2025 21:06:02 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395c7df344dsm11217081f8f.10.2025.03.15.21.06.02
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:02 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 15/25] rqspinlock: Add helper to print a splat on
 timeout or deadlock
Date: Sat, 15 Mar 2025 21:05:31 -0700
Message-ID: <20250316040541.108729-16-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2105; h=from:subject;
 bh=mHarWyQwszN5GN3f/z3j87BhPIW1TR13hflSsKwjB3k=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dhiQsKeYiA7FvlWbRSOujsyykBVeLrqG6gNLF
 YBDPA3+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8Rynm4EA
 C7H/n0q49/3AIp2C9KPljoZUU6s3o4IV5coN6RHcn0nxNkwVlUMUqOmO9viCbAZR7x0xMkUUmESdVT
 d6RNwHM/sq7WZtekq1OVAesNrzsND00uiegqn4MdL6uYpmvesuXJil3K3FhyYPLgN1JEHEqn3q3PBv
 xEs6tpCRF2FogvO8Tfpo6OUlBQnqsAMOrv/J8wqR1Gx7BEvoiegfsOIGmpZAokPBs4GJs6KAPSQClK
 HxiqImv6qIUN2ImtcFSFuFx06FDW/ZuZn9aAOOuM6yq2wUZRxJdk8jnj4FSWZ07IPxAyPRIP2+TY/n
 0PxAmhbVU6mwE2FD91sqm9V8TimYDCc4sNi992SQfyyqJjkx/kGNPQtDvAdPwpyBgxyyzjdR95KB+e
 BFMmApqGZMvfh6jlwMum3F21CCRWHLRpNkz5ljBnZMAjdTg89tJD5sv4Ou6blclSsNG2BCuW8lf3lA
 7ZB2YghW4IzPDw26l5e8nshAHdm5PjP4WKtdJIn0qnNsB5Ola4jYzwK17AyivSL0eKjEJJ/w+R6zJ5
 KgZ0wYVgXs+bAMtfs3VFbCbhWwu/+hDk1SvRy5bdwaGvHJO+1LyPBdOPEIJFkP1mI+vfxEE/9UTaJs
 7wEdv4wf2t1oJZEUzmrqwj+FYMxpA5vyu9CskwLFQ1+b395NhYuHMPNuVNCA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Whenever a timeout and a deadlock occurs, we would want to print a
message to the dmesg console, including the CPU where the event
occurred, the list of locks in the held locks table, and the stack trace
of the caller, which allows determining where exactly in the slow path
the waiter timed out or detected a deadlock.

Splats are limited to atmost one per-CPU during machine uptime, and a
lock is acquired to ensure that no interleaving occurs when a concurrent
set of CPUs conflict and enter a deadlock situation and start printing
data.

Later patches will use this to inspect return value of rqspinlock API
and then report a violation if necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index ed21ee010063..ad0fc35c647e 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -196,6 +196,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
 	return 0;
 }
 
+static DEFINE_PER_CPU(int, report_nest_cnt);
+static DEFINE_PER_CPU(bool, report_flag);
+static arch_spinlock_t report_lock;
+
+static void rqspinlock_report_violation(const char *s, void *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (this_cpu_inc_return(report_nest_cnt) != 1) {
+		this_cpu_dec(report_nest_cnt);
+		return;
+	}
+	if (this_cpu_read(report_flag))
+		goto end;
+	this_cpu_write(report_flag, true);
+	arch_spin_lock(&report_lock);
+
+	pr_err("CPU %d: %s", smp_processor_id(), s);
+	pr_info("Held locks: %d\n", rqh->cnt + 1);
+	pr_info("Held lock[%2d] = 0x%px\n", 0, lock);
+	for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+		pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]);
+	dump_stack();
+
+	arch_spin_unlock(&report_lock);
+end:
+	this_cpu_dec(report_nest_cnt);
+}
+
 static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
 				   struct rqspinlock_timeout *ts)
 {

From patchwork Sun Mar 16 04:05:32 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018289
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DE4B197A8F;
	Sun, 16 Mar 2025 04:06:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097968; cv=none;
 b=ACS1noguW14XanDYaihjT0qjXN55x+65LQeKXtsGjDCwW/PgdvINdgAyKKmKwQwZ1TuNSwsSCdew4eTEW28++pFAt7aZWvm9luPLNGrhZ7t+sepo/kj/wFzhqjkH0WuNJjgH7PXd7WWLiF0LqsAk4Jjxyp8JIPyQ+/DemMvm694=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097968; c=relaxed/simple;
	bh=Yp6VTxKtpBDQhUgfKGSHMWzJmZu3deRDpy/hLhkjXfk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=eZyzhXoWYLfUR5TX7XQQjVqLxQgyizh4CH8k50cOTATYX4WeYJe/+8fH6Dy0JeXFYBMutTv4RNfi/bqHYENIER4LeO4BifuJ+EarOuehmPs9YBp9vsdidG1twn5bTOXtWXDmJQVfEEa/T8/6MX2RWMhFoNVraMkthfbmvA5CI/c=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Q1L9qma+; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Q1L9qma+"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43cf06eabdaso9865675e9.2;
        Sat, 15 Mar 2025 21:06:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097964; x=1742702764;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=lVJp3+JT7y8+ub6buLRsxOUyy8d6vFpxA5yFbRAEMPI=;
        b=Q1L9qma+XiAWK6JbYpNLupqgch5MmwFPlE/5TssWPKM+iaNVXZvsSDk0aohkPyZNJ8
         Vge17l+ncR66vgUQ7d1lA04plvsowEKBFaHxeXBJXycRm8DQ6gaTg666XlIduz4//h+q
         phC6P7xGb7AjW3TAwqw5+vAl9btpiKNWw+TxhMTfRd1bfZCIjO90IH5jVcccWEqXxjK2
         utSl01mauvq+KaaJcuNaiB8leAFBXBAlkkrBZPuelGzEdebRNjGeuGp3xeCv5PjB40X+
         IkMgP5M5Fdz7fCswSAoYHf8QHSVubAEHVVdvh2+ZQc55bgO+hHUs4FxLJinArXFRmJhU
         PZgQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097964; x=1742702764;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=lVJp3+JT7y8+ub6buLRsxOUyy8d6vFpxA5yFbRAEMPI=;
        b=uNwp7z8h7WlWVKe+BdMmRz8naKo9ZvaalIehJuKPaCSQoDDxUizFG2hJZ8UGcHoKIq
         tly6DJALr8b6UcMfBzKUN2iFQ+XWBNTQDcOG+ES+VaVRUvTxyPNP2Lgec8DlPJw3Sjy7
         Ic3h7wtCpGfnM4qr2lZMZPRc6d7monrDl94indmHw7gRh0d3ot7l6pEP1BYvPM7ntEj8
         gVOcDAEQ56FGzD898ZP277iR15JJes9zmhPGAJonkd8kX2Oy5zFpv8T5bvxORkyeOKgH
         FLImy/HIhMfV6m+9d4r3PAcvALZB24VczqZQSqhYFi51rLVD1sO9pyjz3IcnWjA5zxG1
         tQdA==
X-Forwarded-Encrypted: i=1;
 AJvYcCW+lU8c4PGTZAeeso0sfkOIpxpQUkk0PgXv31o1Zb0+AacMX/Jiw9bkQu8l+LtTrZR9xjoM9XPubSpKmRc=@vger.kernel.org
X-Gm-Message-State: AOJu0YwGloP3QrbUh8cKk7Daxy/BoHk0PVCucSEj1iuY/WniXJFvvaOQ
	nYNHRW13YvXDUITYsuUCWRq2PY4935kk5E/IVzbPFYJceTqOFiPlCFt9rf1hx8A=
X-Gm-Gg: ASbGncu4BLmAaurCHbRilelIpYsCFb8EL6dpb1SGxQcUU1MiuYAqKWNqQBryKzl4V/X
	hybbpglGav3mHla/ViBlu2IkXh6I78BpfXe11HjJXEmld8ZQnSADDaDH3A/HQxNq4ZshYLH1yxJ
	RVaJmX7IA/v0lDAldvComJKGluCjGv676Mb9hJtTcpe3go79eEjtWFIe9owBm+3Oomn/pocwQi8
	Jxijd5DfmKO2qresJnPhS7qkvMxmU4i7bT8c8dZTWTqrs2rI7fOf/rqvYoJgYmtYQH7uS4V5Uae
	ie6dVKQ1ZkXPHeh8jf8c3XJTQ39ZTpUfOqA=
X-Google-Smtp-Source: 
 AGHT+IEqbu6eKdKbUR9hLIUzBRTM7W+fO2MaRKuwN5PjghpeobTfOg76MZ+yqvlOP3B3sgZbmvRoKw==
X-Received: by 2002:a05:600c:3c89:b0:43c:ea36:9840 with SMTP id
 5b1f17b1804b1-43d1ecd7926mr80182815e9.22.1742097964115;
        Sat, 15 Mar 2025 21:06:04 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:70::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d25593a94sm21073705e9.3.2025.03.15.21.06.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:03 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 16/25] rqspinlock: Add macros for rqspinlock usage
Date: Sat, 15 Mar 2025 21:05:32 -0700
Message-ID: <20250316040541.108729-17-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4185; h=from:subject;
 bh=Yp6VTxKtpBDQhUgfKGSHMWzJmZu3deRDpy/hLhkjXfk=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3d7ne1febSzud04+mu0CKRHqx+H8Jq+gCO2ZsR
 MN8cyy6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RykcmD/
 9gTc/RWOMn5cBBm9fdR7z3Whng8H/b1v47vd75XWWl34N0yLT/YWBxTTQ+sPVZzGGuo6B9xsO7eKc9
 SZp8I4okxcB2aM74N8m344h+WeCA3ZDdOAN1PStTNACyJB8JKCSwhxf5/MD4VnGmAAvRd7JcgDsl3T
 lkua13O2hBIxLJEBPb7jFRav4gDcQhaWZZY8+I8wIj4D2LsynzlcQxPMMg/IOGoojbmx1IZpibvTSS
 cOab748UkW42TGSTq4cdoSqNEyQkllc5IG3CoplNecSOYw91jN/9/nfpRQLlkeNk4MMkIpW1wkerkH
 stNypWjI3+zMxxxWOpnatrs1AhcbL9xdZy/yI6HeaBpRLCWE7EP2tyRtxMLNP0xZTzDUbhelU5s2xS
 ISufWNdpb9hhDr2p7Az4+QCAW0/IGguxT1YSSJ3VIvP0PSGYdPGn7h5UGMmLAsKRteUQPiIinHVfjq
 jswG4p2GalFrgmjtSAlgCOgfnnBgdFSBZ8KrUhAEDyoSkn3oC7EfkzE4RurRDNaK7Dla53h3swFtN6
 ebHJiW8K6xEarRowvHg21BJJhRkrGgUz7TcTos8teOYQ7GScBu5AHx4/hAgO+A+G8EAYCC6p9bLmWT
 uhQX3Azu8Y8rim3+JoGB/tvJ9vAcTAUH3AyLU5MWGMfUmvqRvuAqdg2eMQjQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce helper macros that wrap around the rqspinlock slow path and
provide an interface analogous to the raw_spin_lock API. Note that
in case of error conditions, preemption and IRQ disabling is
automatically unrolled before returning the error back to the caller.

Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback
to the test-and-set implementation.

Add some comments describing the subtle memory ordering logic during
unlock, and why it's safe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 87 ++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index a837c6b6abd9..23abd0b8d0f9 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -153,4 +153,91 @@ static __always_inline void release_held_lock_entry(void)
 	this_cpu_dec(rqspinlock_held_locks.cnt);
 }
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
+/**
+ * res_spin_lock - acquire a queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * Return:
+ * * 0		- Lock was acquired successfully.
+ * * -EDEADLK	- Lock acquisition failed because of AA/ABBA deadlock.
+ * * -ETIMEDOUT - Lock acquisition failed because of timeout.
+ */
+static __always_inline int res_spin_lock(rqspinlock_t *lock)
+{
+	int val = 0;
+
+	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
+		grab_held_lock_entry(lock);
+		return 0;
+	}
+	return resilient_queued_spin_lock_slowpath(lock, val);
+}
+
+#else
+
+#define res_spin_lock(lock) resilient_tas_spin_lock(lock)
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+static __always_inline void res_spin_unlock(rqspinlock_t *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto unlock;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+unlock:
+	/*
+	 * Release barrier, ensures correct ordering. See release_held_lock_entry
+	 * for details.  Perform release store instead of queued_spin_unlock,
+	 * since we use this function for test-and-set fallback as well. When we
+	 * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword.
+	 *
+	 * Like release_held_lock_entry, we can do the release before the dec.
+	 * We simply care about not seeing the 'lock' in our table from a remote
+	 * CPU once the lock has been released, which doesn't rely on the dec.
+	 *
+	 * Unlike smp_wmb(), release is not a two way fence, hence it is
+	 * possible for a inc to move up and reorder with our clearing of the
+	 * entry. This isn't a problem however, as for a misdiagnosis of ABBA,
+	 * the remote CPU needs to hold this lock, which won't be released until
+	 * the store below is done, which would ensure the entry is overwritten
+	 * to NULL, etc.
+	 */
+	smp_store_release(&lock->locked, 0);
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; })
+#else
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; })
+#endif
+
+#define raw_res_spin_lock(lock)                    \
+	({                                         \
+		int __ret;                         \
+		preempt_disable();                 \
+		__ret = res_spin_lock(lock);	   \
+		if (__ret)                         \
+			preempt_enable();          \
+		__ret;                             \
+	})
+
+#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
+
+#define raw_res_spin_lock_irqsave(lock, flags)    \
+	({                                        \
+		int __ret;                        \
+		local_irq_save(flags);            \
+		__ret = raw_res_spin_lock(lock);  \
+		if (__ret)                        \
+			local_irq_restore(flags); \
+		__ret;                            \
+	})
+
+#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */

From patchwork Sun Mar 16 04:05:33 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018291
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DE9419ABD1;
	Sun, 16 Mar 2025 04:06:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097970; cv=none;
 b=SOwoXLUqmwJnTRuxo2rIXdeiuHviyE8QQ5e00X8Y5yuyHS9MDSguRXq6612s9I0BX8I6r3SBejd8SS5kqkZtr2xrLuOL8xlqtiTzjN8bJN/f202gtgT10v7ug88JYV5s2iwVub3/kRD0ym/qNuUyxYFJwCcvgmlnIlj9fu5rRaU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097970; c=relaxed/simple;
	bh=TAwmD3mE9IWzqTEdxgzO+SkBgGc9QW4lOFTMCWQX6XQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Oso5sTbA8E63X9RGzkEeBixnJ3a3PkWELEpstlcXkqbcwieam7SZLDfL0cxdIfRc9EMJGL/dk+F6hC+cxx6Qh4dtCbcPrkmd9NjP8KhO48fxJD67BwILlvWKJzWUejBWpc3wWV+n9HByD6hiTgiloIR346aJ6MN5G2Qtn3qX7zA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MsRgX2YG; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MsRgX2YG"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43d04ea9d9aso3923385e9.3;
        Sat, 15 Mar 2025 21:06:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097965; x=1742702765;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VFS019JG0RFjz5io9ra87jX+S2pDR+I2Zduhbgoaa20=;
        b=MsRgX2YGf2wOhT7f0TCraCICb3m4wiEvVnOxzYCpfvP1W63kjfwhaSqj7J5ps8S6im
         JF86sP1GEonrj2SVPdRSDN7kXQejsM9CkvNS1C7FfAal5Qk9JtJVV5ww0xCsps2G02eQ
         apyD61Xx9oFcGhKhlzdF81/qc/S86c9C/y//+zXppUm5rQUskg98HKEFMl0pnTLgnpVu
         nKVB7lXM4Kqz/Y5Ww6wQVqtpl19iJNFUYbRWxzv+3Z5/y1Z7SjzVXlU7LKIuu/OSpHxP
         BEN20imyM46u5mhiuTCQjEomLL3xmlIry9MYKYXby5B/u7PKT84r9ECfyiSH0Y+TEXcB
         QwDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097965; x=1742702765;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=VFS019JG0RFjz5io9ra87jX+S2pDR+I2Zduhbgoaa20=;
        b=AtJfeglDwxU3UwRf9OyB1ADrHpWtm91T7GTjZsL030DUjp4eO1w4OezfFpLr4njpH9
         NO7bmrEwDqCw4yPYBFquzlemAWlJZhCYkZzZ9PMDiifYnV6TJmhZQgY4DoUCEABnHU27
         qis5M/5M0PmsoR9Y1gJ+WfN3rrOr5e6ePKddM7AEVD4pAUOhpYJBGx5PsydI8Rkoh5WR
         B+TnX6jbLdz52S/q098aYxO89SoXEa0qDtVIpJpmT5mpyJ1ZdJpAJQFmQIfziTval+Q9
         EwWWKU5GYpq/fROVEIMQEkm+/ZI8ruRDE59ppO5d4WTTpLWsBpBKIPiJexNLLl5zLXIR
         /pVQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCVw1nfjOsPZCGA5GozzKKgtfZg56oubOhCRS1D0Fh5lzlDcWyYPSooX7aOytQq//MDURxyuyNnCvXRXDas=@vger.kernel.org
X-Gm-Message-State: AOJu0Yzm3beMaFUx51Z0LJotDzrTdmI+rqpyJRg0HtbNZPxPn/PF5tRr
	9XDGbXY6we03g7fgfjsUgaSEOqdIbtykTz4SH5+WNIgV4kA9gub9XRRQdu2LKKM=
X-Gm-Gg: ASbGnctjn7UvOIpqTlNB5L8qKI4rue5z9D5dQS9RXfxUwE5oVusMe+tH/JF0wcpw2lC
	Qepy8TsWJShBc2muYbCSADBgjBFUgxRHOIMerWDoQmfFpcc1jNEoBYs51k4GC1APMqfKpSu4aJA
	6HWudUdhxn9VBZ/R1eWpwP9cMUJJNTwYxKiyxDCAVdp8JfuUieoPF/LN/pyg0vNZxQZ5p8QmiST
	jGdMXCaCtDLxQWGgveMkZs+quP9FrWRgW5s+wVtsKlEB3IFytYQBESDvMlfE3Q6hHVOQPsQjZ4H
	DIcmpoXkWlJhmTNrT2PoTmMXTIB12ewuMw==
X-Google-Smtp-Source: 
 AGHT+IHBIpDn5I6oDUPVr9V5ItvG6mUgYbh8JUY0xjPJvkMd3lQIrG6tia0tYndOnZLCB6ubQR+rjQ==
X-Received: by 2002:a05:600c:56c5:b0:43c:fffc:7886 with SMTP id
 5b1f17b1804b1-43d1ef4b074mr78421855e9.8.1742097965220;
        Sat, 15 Mar 2025 21:06:05 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:1::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d2010e618sm67780255e9.40.2025.03.15.21.06.04
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:04 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 17/25] rqspinlock: Add entry to Makefile,
 MAINTAINERS
Date: Sat, 15 Mar 2025 21:05:33 -0700
Message-ID: <20250316040541.108729-18-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1896; h=from:subject;
 bh=TAwmD3mE9IWzqTEdxgzO+SkBgGc9QW4lOFTMCWQX6XQ=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dm8uyglj3VmEv2htJ2FLhnf1JsYuPf2RZsbxU
 w2qldUiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RytukD/
 9AxDBzDg2jelKivmQxOWC/3ddLF46czI6LpjRZJqkS8CF+Rtgtq1ntcZfd3DwB1YlWNi59wKsFGBnJ
 dvHyPQHl0wf8x/+YSIKnWOfRTtXXm7HO9EOcusWDuo7nYRoA6dOki8cc0dHPcr7GEZguYUvNkT0I48
 RiCeb9ouXdF+9R7dZDI+htv3iB6XrR4vhmfLw+DsE6xSAwFvvJafz+GmaF+szclNho4RF6EOzxUl+s
 9fggq12ESIsqk1CtgtcCSL6XBKYoZejR5xun8ldsowdPo5VU+FkBaYLjflcljs+9humS8ea8kzLQdP
 tBQO13yb3uzxTs0rzzQ5/m2QAPHnV4C/XBS1o7/1f/CZhhWnZfWCaezDqmj0DFHBrWFw2t2BYMKyOt
 pEej2vsoOJjeV8O2P9XN9/Iu3qJeRZnrOSzeuVMQefuxjD1W0fZFj6Xwgp1NI5FVcn/7XfvNSAOiu/
 DO6EIdHh69dWmTxrN+gImrK3Wzp51GSrlSwfasmj7JZIFiKeByLszlkhWluDw0sXtGBsmACfkKy+sM
 T3FsOfpMqz/WkgyhFoHDnsNMix7+9iuhgQVHa+2cRw3AIbEwsgvPZpdR/BwhcqlQHmSCPw7fcEmu1a
 biNUSz51QbPHYHW/PBBsMPvZawZXKGycs82Q07EQPrq2bm8G4sFtEpjZMzyA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Ensure that the rqspinlock code is only built when the BPF subsystem is
compiled in. Depending on queued spinlock support, we may or may not end
up building the queued spinlock slowpath, and instead fallback to the
test-and-set implementation. Also add entries to MAINTAINERS file.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 MAINTAINERS                | 2 ++
 include/asm-generic/Kbuild | 1 +
 kernel/bpf/Makefile        | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3864d473f52f..c545cd149cd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4297,6 +4297,8 @@ F:	include/uapi/linux/filter.h
 F:	kernel/bpf/
 F:	kernel/trace/bpf_trace.c
 F:	lib/buildid.c
+F:	arch/*/include/asm/rqspinlock.h
+F:	include/asm-generic/rqspinlock.h
 F:	lib/test_bpf.c
 F:	net/bpf/
 F:	net/core/filter.c
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..8675b7b4ad23 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -45,6 +45,7 @@ mandatory-y += pci.h
 mandatory-y += percpu.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
+mandatory-y += rqspinlock.h
 mandatory-y += runtime-const.h
 mandatory-y += rwonce.h
 mandatory-y += sections.h
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 410028633621..70502f038b92 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
-obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
+obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o
 ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
 obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
 endif

From patchwork Sun Mar 16 04:05:34 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018290
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59F2019ADA2;
	Sun, 16 Mar 2025 04:06:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097970; cv=none;
 b=FiRL9/R8XBdRP6H38cbPbL+DhDt95BcfjCbSzawkApRqUOYTZ3gmajjbUyFmkUSupwOnnFbc0qm7nTkhMnmWOnzVdl6xQ1ggFKRtEmy0hN4zSQ1q97HLieqR5z1/jLwkphLRzL9Wt8VolSE8r1RmQCDTy5bQVTkljp4MP6IYRIU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097970; c=relaxed/simple;
	bh=SBe9SrR1gqB5bS55WnFfwdHWzM4LavhATA7RzB99U+Y=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=MO0NN4YyT0Zn/Br8TuFTiOAI/4oHJnHWR9/c3pBto92/GVn2WGyBQcLaT2r5TIYFn2sQO2d4XOxFmVJksUCkNPfupNYF0/FWgG9fReAah2P/uh+DPEQ1inkHBXIoVKz/qDi6PO94X810MEhdPIgT9yaHrnyR6EQd5YrisSoQ5kI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=mZcssJmE; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="mZcssJmE"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43690d4605dso6607925e9.0;
        Sat, 15 Mar 2025 21:06:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097966; x=1742702766;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=elS4VXsAjSZVIDIoCNxyLW7RW0fHjd1SmHmDu6oNDNQ=;
        b=mZcssJmET5txjquHTJS05NnT9IxD/rQxyTpPFXsAGAM/hF9DAVtZKwK5VHIYOE6/mP
         H21VuraN9GCcvgUZO8CQJLBgczLNxnkKHKdOhc3alB1S1uBpEDkEzPBOOuBZdgDor6Hq
         LY3mAOx2WtVMV5QgaHaxJG87GIOLzZY+IUTEZGa0OM35sHD73ZaqsiwNaNdlyPTcDGbT
         TRPI20t84yESEj5188We/gbVQYw4At4g+4uW+hqPyVAAOOfHAMLUO68eh5ACVR2YLXuM
         3qJU3nDPuR07VZw+ZpvtxSyK1MCqgdLW7QEiw3zthS2SM5UX+vgMO74Gsk8latjALMD9
         xNmg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097966; x=1742702766;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=elS4VXsAjSZVIDIoCNxyLW7RW0fHjd1SmHmDu6oNDNQ=;
        b=H0Q0XZH6znJfAyqyIKM8ef595MVuiBapxhVvzqx4Pqfs+XHysDCCV64WLJqOa2o5ZT
         GknR9zDQVmLyPDGPR/EOl9aMr++qOBvYdc9coBB4BYOSqrYhh7jW2mh7aVFvKp0EPmyH
         diCsM6sxOrAJT64nrUGAmIgVqG0OueTCh+g4Vx+Run3owowAeWBbHaae/ynL6vedd1oY
         ZJI/ir/DzywPCr0nH3XA7Hr2FGfElg/mSl0lw2QK7gXVBGhVlgeJx1Mgo17F1p0e9PPt
         PJNUGWbgiC4zTTmK5j8fTKXuZHMoK28WQODZUgK9WpCy7C0l8XkOphQ16ZkOEzCcalBv
         ihAQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCX5/LbSU3foBghE3M3+JkH32Yjkdi9QSk49TCQ/G8OKB+fP71KNEKX5cuGCps12ihfoO+WMeoH5P5qKVbQ=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyj3TjpSY8SlSqVCmXJeHzbd4tzJ2BAesVgW7MlSrGQvPk05cGL
	7/NbkISj5jBG6IkOg1EwmOKvt2nFJwBni/LJCqISQR3ucfIWqSvRloqogc2Rz7Q=
X-Gm-Gg: ASbGnculgdrakAvCcvC53ynG9+uEmV69TQWNLdeQcR0idO2CRtokH5KBh8ZkIaPm5Rk
	eIVwu4BH+eIjw7A3sM0K26yxuXSsahkAodHOQqjQnpprRSPygRTteF86cAovp5dp8U0NmTb4jfW
	BFc2yuTI55mZXPiS4mJw9MUXI/FVbV92MWRAoIVbYMC2N8EWw1k7rX3544QKe2cLyJoYWsHXwG0
	il8TaZDbKP5JNXjMzfS6KmRdwL8UZBNSSPOgD+/h4KSD7WQ9EKZ+cIH6qtpwzKae5C32pV1tJ2E
	UUYGMg33FaqHhl7s23H3gkVvTXyS+wukeQ==
X-Google-Smtp-Source: 
 AGHT+IEnbdVB/nE2Pi33KFYYtnULRWUI98TYw/MBJWQBTX6XEUdt0SDKhZrXk7vHOjd1S2LDfZ7qnQ==
X-Received: by 2002:adf:c790:0:b0:391:41fb:89ff with SMTP id
 ffacd0b85a97d-3971f60b104mr8750937f8f.27.1742097966388;
        Sat, 15 Mar 2025 21:06:06 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb7eb9ccsm11053346f8f.96.2025.03.15.21.06.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:05 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 18/25] rqspinlock: Add locktorture support
Date: Sat, 15 Mar 2025 21:05:34 -0700
Message-ID: <20250316040541.108729-19-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2633; h=from:subject;
 bh=SBe9SrR1gqB5bS55WnFfwdHWzM4LavhATA7RzB99U+Y=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eNpMeA1g8bEkNbE5LoMw6rWgMD0iqfFTREsw3
 1ViDmC2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyuqtD/
 9/IGQPEqqIYM45Wbzz/zxdRnXzdviyqlrvI07Exouh0vJd+riQkKUn0fvtxOmYGiW0hf+KSPes3ypJ
 tSGUhUEHoy4KZ6aHeBpDd+Atw6aeLia+nCh/xXga/cb8an5pVO+3oHinEiNot2dfTsznbw2rgqLI/o
 18KlBPsGuCz9o9fNDbrrYW89iCAR62qZm/ELBGZ5tUXVLKJTJU5+/FUG2EM+D6yyeWllgXPFCL9B//
 ySaQANZ8Bhy9vbrVe4wW67L581XJr2ML9um5yEfmYDqkTanYBBHCU+e7kguhEs2Q0bjT9F9AO609C7
 uL1NTzcDsH6c0Avd6a/ITdoNp1E4ZvVp6HQqo/pBk6PXgu1gBcudXCyPs+Igx8NV4FiaP31yQ5e01S
 9UbxJSSzxs2+fgRnO/i6pqNrO8Mktz8buhxlr5r2gqkDOXqWeWZEDHrj32fD3WgY7j99ENTR9y98bp
 L0wcKbx1en8AJckwWYMKtSC5TbsAU1rBvKW1jJtjXsifakOYMtf+dASlhIY/fIoefIOQ98NgY3wJDS
 RwY9kcDNvp8LkYBgR0/kRIpMAkUqpoon7lKnqgV4ZsxKpi8p5BN3ZSvGMbHMrJs+4gaGIeSnBJFaNM
 5Lj0WTWrsYOJXYBXA0iaogmJAwZ8UtN5+FNxg+SHFESqpNMmvjruoQOLu8vg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce locktorture support for rqspinlock using the newly added
macros as the first in-kernel user and consumer. Guard the code with
CONFIG_BPF_SYSCALL ifdef since rqspinlock is not available otherwise.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/locktorture.c | 57 ++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index cc33470f4de9..ce0362f0a871 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -362,6 +362,60 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = {
 	.name		= "raw_spin_lock_irq"
 };
 
+#ifdef CONFIG_BPF_SYSCALL
+
+#include <asm/rqspinlock.h>
+static rqspinlock_t rqspinlock;
+
+static int torture_raw_res_spin_write_lock(int tid __maybe_unused)
+{
+	raw_res_spin_lock(&rqspinlock);
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock(int tid __maybe_unused)
+{
+	raw_res_spin_unlock(&rqspinlock);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_ops = {
+	.writelock	= torture_raw_res_spin_write_lock,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock"
+};
+
+static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused)
+{
+	unsigned long flags;
+
+	raw_res_spin_lock_irqsave(&rqspinlock, flags);
+	cxt.cur_ops->flags = flags;
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused)
+{
+	raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_irq_ops = {
+	.writelock	= torture_raw_res_spin_write_lock_irq,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock_irq,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock_irq"
+};
+
+#endif
+
 static DEFINE_RWLOCK(torture_rwlock);
 
 static int torture_rwlock_write_lock(int tid __maybe_unused)
@@ -1168,6 +1222,9 @@ static int __init lock_torture_init(void)
 		&lock_busted_ops,
 		&spin_lock_ops, &spin_lock_irq_ops,
 		&raw_spin_lock_ops, &raw_spin_lock_irq_ops,
+#ifdef CONFIG_BPF_SYSCALL
+		&raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops,
+#endif
 		&rw_lock_ops, &rw_lock_irq_ops,
 		&mutex_lock_ops,
 		&ww_mutex_lock_ops,

From patchwork Sun Mar 16 04:05:35 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018292
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CF3D19DFAB;
	Sun, 16 Mar 2025 04:06:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097972; cv=none;
 b=BsGts0xzn6TrZIKM4pnjkXmY7O8q0sY+suAAnd1zPFFPJhjB0evaRPyicM8ImNVr4tNN7n+iu8KDVBxG8D8cHrUW6PQdXxGq4gF5uYg1fSCDLxF3IqaZJ+/HtQDxEfeiGtZeoeFsIy2WxtDvzxJYFbzo/ghsues0T/xjya1Q07s=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097972; c=relaxed/simple;
	bh=VwAQAxwAow7TrrvxCboQzb6cYiUbz63qPyaMUR2Lneg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=HlwkGzQ8oXW2Mesh3uawsmEuZqj8uTz54Y4SPLpxxhqfaAOaHwEyEwfLHX1xKI59TY4vXFfXjejm+6CCnDE1ucXk0nzUJUJ5r/Lanh+CcVtXCbjIJEqifTfZ5+mSOmgQ+0WTZmdZB5OzFUmYA+ddD4bbTNEJcqbvtnD+zU4wkEU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=fwo5xX6D; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="fwo5xX6D"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-39141ffa9fcso2847408f8f.0;
        Sat, 15 Mar 2025 21:06:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097968; x=1742702768;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=CF4Uk2MYBA/hkoA+QSy5dWTlz17aW/WlIb/dNVcZIBA=;
        b=fwo5xX6D5hc7ZOR1gA13/CIoA/m8pF3mnhd/Q3/FmmOri4qvbdJ8LYhhpz2tFnBQF6
         iiBeGCxc9LwBzlWjpQHtJkY3d8fEkT6E5Bc/qQkasILJGj9V6BfRK8dE4njC88r1GcHR
         BIsB0zahnEcMsbucZPy3qJxUoRlSRzTur/U+G3LGcG6PmBXHN2eAUp4IOFB5Fj1dZPM1
         sTFSctW3JgDW5NyG5FgWTrg1LkS8Xp2HYZsMBYz0XvkRGz6zof797WJAnw0zUlYhbX2g
         MYcrHnYM4aVEbUt9AYwaxM/rYyEho+Xit3hat8i3p3FctBdmx+FQvb3VTNoAE7NizkCC
         yvHg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097968; x=1742702768;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=CF4Uk2MYBA/hkoA+QSy5dWTlz17aW/WlIb/dNVcZIBA=;
        b=LYxByxq0EcboaUrG+hRpQBtz44GBeJuOahplEGqHCf74qSKVy+b4X2RcR+i68bZ25n
         M/CZJ7HYDRGBoNDoCZNj+iH01xsTIXbNVht/I8YUBdJNq2NeZ0eQoKhbU0Q3SIOwMUAB
         bq5HJ6jECjoRQiZtUM6WRkGYnrbM6NBeoLTs055VQQk+1D7Zlo1iVv4RlRVQKaWutF30
         TJD2pnkjGNDD9o/YbJpPyA5Z/K93bYB0lSOvtJNxtNk9M9aFFCjm5MwMRxGnDUO/PkfG
         YVksqxIWz3sJ1dg1winjN0MpySTwB+u1clD0vXP8944fYU1stMDrOzFAm20tjZcBgNVk
         9Hzw==
X-Forwarded-Encrypted: i=1;
 AJvYcCV2NWh7ngZCX7wdKUSh2F6SlyjWXXqDTOPaTXwsRdf9ABgI5+j2HWkzzUxJRR2dko/tU6jpCAciXGhCGMU=@vger.kernel.org
X-Gm-Message-State: AOJu0YzyHMAePD393kVjZ62cIbNrkb92ixAmoQtyV3o0I0X6k+woJ+Ri
	xQoIxZoanWDo3l3Mnh7l24c9zc8n/S6rDMyDC6PU6jKx/uaVMOHEH/uc+TWBsJM=
X-Gm-Gg: ASbGncuzumG0JsLVgOQccg5l/9Cnln5SSh6mBq+aibbIuyjHgV9GfKvMdRBJTypYfOX
	2AYeDbE27HZw3rCx1JXwtKOyrXCewfqk5UsbdZGQr/t8JI3HXWi4tvZsM0B3X5BwnSO70oEfHF+
	Qrci75iLCrQIEPAwAJMasVXhn3fkOopYsW/25aQS+ZkUwFUzIq5d4KaiFN8RS9JVEASkb6QpEcV
	jmAdL03tSiYLH06xGNSknpADBhmkQ1G14T11PGLdbEt+GdHS5UlkHYNi8SwgAuq9U/6abZooiMZ
	B4ZwlSIo01epkRGE3nFbxaYDGlz3f0mkRUI=
X-Google-Smtp-Source: 
 AGHT+IG/cybe/X/vKmWywaeOPxKG+ExqzgfSAMkufyXQIte9dsBNmp17N0O0pQGVtXSWVe9+PQcBvQ==
X-Received: by 2002:a5d:64a2:0:b0:38f:28dc:ec23 with SMTP id
 ffacd0b85a97d-3971d23799cmr10157067f8f.19.1742097967532;
        Sat, 15 Mar 2025 21:06:07 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:48::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb7eb9d7sm10695515f8f.89.2025.03.15.21.06.06
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:07 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 19/25] bpf: Convert hashtab.c to rqspinlock
Date: Sat, 15 Mar 2025 21:05:35 -0700
Message-ID: <20250316040541.108729-20-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=11131; h=from:subject;
 bh=VwAQAxwAow7TrrvxCboQzb6cYiUbz63qPyaMUR2Lneg=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eGs+bJjtEdeiJjw+Cd9PjBR7SOiQpae9Vzm/W
 Hk3IcQyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RymFdD/
 4guFTOgG2ezQtX/qNWMi70RSbmgFndzWKODlNByC2n50hmYhhX9QlUKP+tsvZw793omMVa4G7ivzR8
 dj195Z7rzAQlMEA+Y2fHGnBQqQWjPQhXQk2bQ1Mvvs4/C/zBpkrWLeXIrs3l4L0VhKDvg+nTGxmJGQ
 5pe6Dz+d15FajBJ5NTRVGVX6/BLShHrT3NpUFLx1iGqGO6ilzcz0SKrJXuIX1thW1XMvRbyacIxc8d
 c4rraPbRxMyYAuYdgHX6O2iJCeDXsVnlCJ9w4YSuWCUoiRrt8AF+73hKjEtO7yhk59ZCUsvojBLWQ6
 7OaL3u0YLBcxSyG4OsaCu27eyiDMBSYTJQVfjnq63hJ00wz+6TReX4DDpYAn/KUjeQLxVBgjJ2HRYh
 JMe4RZjVlfiK51f7sGm9Oy5JzPDbkpQSBBg9DDeMWd+aOkMDTzERynTnkWImFKroy/83XrcICnFQDC
 uSy6RpJt69kFqYBIjZ3E6vOhQOsNPO2Yrf53s+utbaMKhRLyMdMWYrgSiP2r0CSyFlpEWI3ZWhW9WF
 mfskIl0ctWP39nXAdkW/yj8vNd3DeBVAAoWJp35wU3L1cBjMrTAyXac4M8lv5Ua+lc7vYEe85mI1ry
 qx+it9Hd826i/HpocK9iaHfGwuWdtF0b2JNeAuwKceiyeIVeKMW5ebSgsc3w==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com
Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 102 ++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 70 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 877298133fda..5a5adc66b8e2 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,6 +16,7 @@
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
 #include <linux/bpf_mem_alloc.h>
+#include <asm/rqspinlock.h>
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
@@ -78,7 +79,7 @@
  */
 struct bucket {
 	struct hlist_nulls_head head;
-	raw_spinlock_t raw_lock;
+	rqspinlock_t raw_lock;
 };
 
 #define HASHTAB_MAP_LOCK_COUNT 8
@@ -104,8 +105,6 @@ struct bpf_htab {
 	u32 n_buckets;	/* number of hash buckets */
 	u32 elem_size;	/* size of each element in bytes */
 	u32 hashrnd;
-	struct lock_class_key lockdep_key;
-	int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab)
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
-		raw_spin_lock_init(&htab->buckets[i].raw_lock);
-		lockdep_set_class(&htab->buckets[i].raw_lock,
-					  &htab->lockdep_key);
+		raw_res_spin_lock_init(&htab->buckets[i].raw_lock);
 		cond_resched();
 	}
 }
 
-static inline int htab_lock_bucket(const struct bpf_htab *htab,
-				   struct bucket *b, u32 hash,
-				   unsigned long *pflags)
+static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags)
 {
 	unsigned long flags;
+	int ret;
 
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-
-	preempt_disable();
-	local_irq_save(flags);
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
-		local_irq_restore(flags);
-		preempt_enable();
-		return -EBUSY;
-	}
-
-	raw_spin_lock(&b->raw_lock);
+	ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags);
+	if (ret)
+		return ret;
 	*pflags = flags;
-
 	return 0;
 }
 
-static inline void htab_unlock_bucket(const struct bpf_htab *htab,
-				      struct bucket *b, u32 hash,
-				      unsigned long flags)
+static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags)
 {
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-	raw_spin_unlock(&b->raw_lock);
-	__this_cpu_dec(*(htab->map_locked[hash]));
-	local_irq_restore(flags);
-	preempt_enable();
+	raw_res_spin_unlock_irqrestore(&b->raw_lock, flags);
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	struct bpf_htab *htab;
-	int err, i;
+	int err;
 
 	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
 	if (percpu_lru) {
@@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab->buckets)
 		goto free_elem_count;
 
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
-		htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
-							   sizeof(int),
-							   sizeof(int),
-							   GFP_USER);
-		if (!htab->map_locked[i])
-			goto free_map_locked;
-	}
-
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
 	else
@@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
 	bpf_mem_alloc_destroy(&htab->ma);
 free_elem_count:
 	bpf_map_free_elem_count(&htab->map);
 free_htab:
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
@@ -820,7 +786,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	b = __select_bucket(htab, tgt_l->hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return false;
 
@@ -831,7 +797,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 			break;
 		}
 
-	htab_unlock_bucket(htab, b, tgt_l->hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l == tgt_l)
 		check_and_free_fields(htab, l);
@@ -1150,7 +1116,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		 */
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1201,7 +1167,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 			check_and_free_fields(htab, l_old);
 		}
 	}
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l_old) {
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
@@ -1210,7 +1176,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 	return 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1257,7 +1223,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	copy_map_value(&htab->map,
 		       l_new->key + round_up(map->key_size, 8), value);
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1278,7 +1244,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	ret = 0;
 
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 err_lock_bucket:
 	if (ret)
@@ -1315,7 +1281,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1340,7 +1306,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1381,7 +1347,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 			return -ENOMEM;
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1405,7 +1371,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 err_lock_bucket:
 	if (l_new) {
 		bpf_map_dec_elem_count(&htab->map);
@@ -1447,7 +1413,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1457,7 +1423,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l)
 		free_htab_elem(htab, l);
@@ -1483,7 +1449,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1494,7 +1460,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l)
 		htab_lru_push_free(htab, l);
 	return ret;
@@ -1561,7 +1527,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map)
 static void htab_map_free(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	int i;
 
 	/* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback.
 	 * bpf_free_used_maps() is called after bpf prog is no longer executing.
@@ -1586,9 +1551,6 @@ static void htab_map_free(struct bpf_map *map)
 	bpf_mem_alloc_destroy(&htab->ma);
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 }
 
@@ -1631,7 +1593,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &bflags);
+	ret = htab_lock_bucket(b, &bflags);
 	if (ret)
 		return ret;
 
@@ -1668,7 +1630,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	hlist_nulls_del_rcu(&l->hash_node);
 
 out_unlock:
-	htab_unlock_bucket(htab, b, hash, bflags);
+	htab_unlock_bucket(b, bflags);
 
 	if (l) {
 		if (is_lru_map)
@@ -1790,7 +1752,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	head = &b->head;
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
-		ret = htab_lock_bucket(htab, b, batch, &flags);
+		ret = htab_lock_bucket(b, &flags);
 		if (ret) {
 			rcu_read_unlock();
 			bpf_enable_instrumentation();
@@ -1813,7 +1775,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		goto after_loop;
@@ -1824,7 +1786,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		kvfree(keys);
@@ -1887,7 +1849,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		dst_val += value_size;
 	}
 
-	htab_unlock_bucket(htab, b, batch, flags);
+	htab_unlock_bucket(b, flags);
 	locked = false;
 
 	while (node_to_free) {

From patchwork Sun Mar 16 04:05:36 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018293
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BD3E14D444;
	Sun, 16 Mar 2025 04:06:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097973; cv=none;
 b=V1q2LAKhuepK/ZlAlaU1qBLcZzTB3Ird8sl8n2lzfoISuRch3hs5KgRSQjnxYXc/Qh8NuePddYLFgmRGefsgDu8Zs9Gh3uVswFZjifhJArRIbMkzKylR0lVitVp8/H5IZRTkZa9g4MEvaJsU1aIfT+1UKrYMc6z2AxZAUvlDLt4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097973; c=relaxed/simple;
	bh=9q6tMvL0KIAbFk+sY/U9KRjErNQz0NSPRbrt2C3ZJkE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=DYTr8qk0L4yccn5qIwM9hKVPQo5mW31ScgzL3TIKDt2l7o2ohafDTgz7zcv41xl2yiq/Rdylh6F+eA8vMJO4ORxi3oZGuW54HBDQVYMs0hj5MmEApcOov77AN0CG8dwBGaAWCu03EiJNA1UXf8Wz115OH2B+IRWEOjMXoK2Cm/M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Maeif0LZ; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Maeif0LZ"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-39104c1cbbdso1885961f8f.3;
        Sat, 15 Mar 2025 21:06:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097969; x=1742702769;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=tjwed54Vqji5j5yoD9/wFQ/aRgNfk0MsQVfeIXpdSXQ=;
        b=Maeif0LZANN+bGhNlD9S4OxIWSFz2OeAV9aCndNE9eFATc7HqO0ShQTHQVfTNq7nCE
         K0pBJAMQP8FgfZgx3b/Tn+msFv47DWnMbdlUPkn+qGTu3cYb94Y/jUYIILb8pIH89aXQ
         oNALM447nySxljHBV8w2aTvUDvQLiRiBtLYwkHmh2KZR1AwTu4pwxMIN8B72363vyTpH
         4HFXrZSYhAuy87gWXWGB0f1wRBC79xCUQD8dmeIk7qi/R9gv5M0fXVV2UhPrhzkRbM7a
         E/mEfz0Y03BLkctima/aymvQfh67sST9tqh8eIFswMtZKxPHkFbQZey893HRVEmN7LJ0
         OQPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097969; x=1742702769;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=tjwed54Vqji5j5yoD9/wFQ/aRgNfk0MsQVfeIXpdSXQ=;
        b=csNgFwLwEnwxNNCByxTDEbsh3YaM+hIQEq8Y4YXenkGIiX8cG/IS7D/hwtoEnuDxRR
         3/IbWz+yV1HIC+SNR3qMxNl16vojh3y/RCUdmmetpsCWO/mDWgS4F02WGY+3VvGgLYF1
         oi8NB+dcI3U6cPjwwH4eq5nA+u8fxqRi3V7VGt6QY68RGV19sG0RCiW1AwHmIhDpzqgl
         Xg3D2g+w8mLcjFPBk4OrqKAQ3mAuCgufZGC35jQOm/AOTCNbzcl4ijq6ZXKySgUEmN1v
         d4StNrdbtaSNzDbulv4H+OAuFHEp/Qlw1VBnxBYR5Ty6RmCoTcSD4d5X2rAyB9X0ACE7
         Rucw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUNS919v8dTQTVlLenC3ZkCAzfwtRqD64/00inxtyYGiNx+1inVNL7JSE8wOBkhjLzPKL8lTzmvr+RepVc=@vger.kernel.org
X-Gm-Message-State: AOJu0YwsuoS73uTr+ZpZgnzK9o8otT1zHE7quhIjPnWWD3Pk8TU9tW/A
	sWSNE5C+cBy4hEQZtFQ1o+p7JnR8Pqg1ll/tV/JWVJ/8qRPVzt+4BUTEU41S81U=
X-Gm-Gg: ASbGncvaT7b8sYMNwFwkZldKW4ESF3SZg6Qv/Yy0gsCDnlAm4R+CIwvUA/SSMuWzksJ
	CRJLJyFzFPEbJoQDibVHJ70An2SYGxlbxSH3f1rPz6ZIiT5Yc4rwOuhDIA5ebs8iE6dzLLI+oE8
	19MSyDtsLM4kGGnfK0lbcXFO2W+Mb+WJ1uJ7a3fYs9GYm+XJ9UdVDtxV8/I0IF0r36wzFMViHf3
	LLdRmtCnYi9yotMQlIfPj+EKiwlmXhDgGr8clGczWYJ0PEbEkr6iezJFFjcSXXHVRrhBysAMt/3
	NqlthRtJXbafmDQPw+saVhfBl3cn/7rUqA==
X-Google-Smtp-Source: 
 AGHT+IGK4qXm9ArrLCXgXNAYLraEfl7iQZEVMYeN5XftPU/x8Co4B5Nq48EhLKMbp6h132WBFeRBuQ==
X-Received: by 2002:a5d:5f8f:0:b0:38f:38eb:fcfc with SMTP id
 ffacd0b85a97d-3971d136069mr9762245f8f.7.1742097968994;
        Sat, 15 Mar 2025 21:06:08 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:5::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fdda152sm67916635e9.1.2025.03.15.21.06.08
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:08 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 20/25] bpf: Convert percpu_freelist.c to
 rqspinlock
Date: Sat, 15 Mar 2025 21:05:36 -0700
Message-ID: <20250316040541.108729-21-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6720; h=from:subject;
 bh=9q6tMvL0KIAbFk+sY/U9KRjErNQz0NSPRbrt2C3ZJkE=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eRwoCpVBivHRVCIfYOVJ3bSQqELoQ3YdqIHaZ
 0Qn/pVmJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyhG8D/
 4gZK23dPYHGv97XCNumvumPpE5iAC1RvxTgRMG3f8G3jsmtWgXSKJJX0I/LPV9SIjEVENK69dF8OYo
 JtJzi6M0frKXWOH3i348qYosuOyV2drlvwfNUUEFB30EDCZyh15S5rPPv1UpNngps6KckmlscR7wKN
 MXqabQVbUGrYpUvjSK5gYCr7BOGqDYpZ1XagCXPKfXCOsOOn1H3cDaPCfaf8XxgcflHHLVxFBMvW9y
 +YzbVai2PpwEalAolVi6/DC77pJhgfX7rwqhMncu4/xkyFnH87t5JONBFJHZf45QeGx0A4agc9nOg8
 rxjniya94z/TfDYqZLAfQEZIV9kAGOUMvPwhEnXyYuDHDDRec7tKNSO+P2IUb2hYefVM3Ghm0n4p0P
 CmZc9N/Y2GIRXvVl4VLGwfu/ensS1wCFj82w76Ig1qpUYwppwWkH9o5bWb9dv3U8lmN3c8F9NHEx1o
 OIsPoR47YLMoasxqRtXiWe2dvZbilJdVqJelZipX4dCEyIoC+1YYy6r2/ftW3XakSo6UWBre8hFohm
 bGJu1Qo3BxBDSej8CFT3E3KsVzZ5XVrF/RPr0liFKY5TyEV1kO33w/oEzrui9Z/irDz8gIF9aF4Y8z
 u+gd5bcIJtqR98WDZhEnXhXlVzZ3ZFtHlkSpmLfDqE0N5VNrh6rqLtwffGVA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert the percpu_freelist.c code to use rqspinlock, and remove the
extralist fallback and trylock-based acquisitions to avoid deadlocks.

Key thing to note is the retained while (true) loop to search through
other CPUs when failing to push a node due to locking errors. This
retains the behavior of the old code, where it would keep trying until
it would be able to successfully push the node back into the freelist of
a CPU.

Technically, we should start iteration for this loop from
raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus,
we skip execution in the loop body instead.

Closes: https://lore.kernel.org/bpf/CAPPBnEa1_pZ6W24+WwtcNFvTUHTHO7KUmzEbOcMqxp+m2o15qQ@mail.gmail.com
Closes: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/percpu_freelist.c | 113 ++++++++---------------------------
 kernel/bpf/percpu_freelist.h |   4 +-
 2 files changed, 27 insertions(+), 90 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 034cf87b54e9..632762b57299 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s)
 	for_each_possible_cpu(cpu) {
 		struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu);
 
-		raw_spin_lock_init(&head->lock);
+		raw_res_spin_lock_init(&head->lock);
 		head->first = NULL;
 	}
-	raw_spin_lock_init(&s->extralist.lock);
-	s->extralist.first = NULL;
 	return 0;
 }
 
@@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
 	WRITE_ONCE(head->first, node);
 }
 
-static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head,
 					 struct pcpu_freelist_node *node)
 {
-	raw_spin_lock(&head->lock);
-	pcpu_freelist_push_node(head, node);
-	raw_spin_unlock(&head->lock);
-}
-
-static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s,
-						struct pcpu_freelist_node *node)
-{
-	if (!raw_spin_trylock(&s->extralist.lock))
+	if (raw_res_spin_lock(&head->lock))
 		return false;
-
-	pcpu_freelist_push_node(&s->extralist, node);
-	raw_spin_unlock(&s->extralist.lock);
+	pcpu_freelist_push_node(head, node);
+	raw_res_spin_unlock(&head->lock);
 	return true;
 }
 
-static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
-					     struct pcpu_freelist_node *node)
+void __pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
 {
-	int cpu, orig_cpu;
+	struct pcpu_freelist_head *head;
+	int cpu;
 
-	orig_cpu = raw_smp_processor_id();
-	while (1) {
-		for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
-			struct pcpu_freelist_head *head;
+	if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node))
+		return;
 
+	while (true) {
+		for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
+			if (cpu == raw_smp_processor_id())
+				continue;
 			head = per_cpu_ptr(s->freelist, cpu);
-			if (raw_spin_trylock(&head->lock)) {
-				pcpu_freelist_push_node(head, node);
-				raw_spin_unlock(&head->lock);
-				return;
-			}
-		}
-
-		/* cannot lock any per cpu lock, try extralist */
-		if (pcpu_freelist_try_push_extra(s, node))
+			if (raw_res_spin_lock(&head->lock))
+				continue;
+			pcpu_freelist_push_node(head, node);
+			raw_res_spin_unlock(&head->lock);
 			return;
+		}
 	}
 }
 
-void __pcpu_freelist_push(struct pcpu_freelist *s,
-			struct pcpu_freelist_node *node)
-{
-	if (in_nmi())
-		___pcpu_freelist_push_nmi(s, node);
-	else
-		___pcpu_freelist_push(this_cpu_ptr(s->freelist), node);
-}
-
 void pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
@@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 
 static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
 {
+	struct pcpu_freelist_node *node = NULL;
 	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
 	int cpu;
 
 	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
 		head = per_cpu_ptr(s->freelist, cpu);
 		if (!READ_ONCE(head->first))
 			continue;
-		raw_spin_lock(&head->lock);
+		if (raw_res_spin_lock(&head->lock))
+			continue;
 		node = head->first;
 		if (node) {
 			WRITE_ONCE(head->first, node->next);
-			raw_spin_unlock(&head->lock);
+			raw_res_spin_unlock(&head->lock);
 			return node;
 		}
-		raw_spin_unlock(&head->lock);
+		raw_res_spin_unlock(&head->lock);
 	}
-
-	/* per cpu lists are all empty, try extralist */
-	if (!READ_ONCE(s->extralist.first))
-		return NULL;
-	raw_spin_lock(&s->extralist.lock);
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
-	return node;
-}
-
-static struct pcpu_freelist_node *
-___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
-{
-	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
-	int cpu;
-
-	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
-		head = per_cpu_ptr(s->freelist, cpu);
-		if (!READ_ONCE(head->first))
-			continue;
-		if (raw_spin_trylock(&head->lock)) {
-			node = head->first;
-			if (node) {
-				WRITE_ONCE(head->first, node->next);
-				raw_spin_unlock(&head->lock);
-				return node;
-			}
-			raw_spin_unlock(&head->lock);
-		}
-	}
-
-	/* cannot pop from per cpu lists, try extralist */
-	if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock))
-		return NULL;
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
 	return node;
 }
 
 struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
-	if (in_nmi())
-		return ___pcpu_freelist_pop_nmi(s);
 	return ___pcpu_freelist_pop(s);
 }
 
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3c76553cfe57..914798b74967 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -5,15 +5,15 @@
 #define __PERCPU_FREELIST_H__
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <asm/rqspinlock.h>
 
 struct pcpu_freelist_head {
 	struct pcpu_freelist_node *first;
-	raw_spinlock_t lock;
+	rqspinlock_t lock;
 };
 
 struct pcpu_freelist {
 	struct pcpu_freelist_head __percpu *freelist;
-	struct pcpu_freelist_head extralist;
 };
 
 struct pcpu_freelist_node {

From patchwork Sun Mar 16 04:05:37 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018294
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DF01152196;
	Sun, 16 Mar 2025 04:06:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097973; cv=none;
 b=YndlOTeO83hhdf8OqqjtGHC52WlAL7B4CnqsHkNe4N3n9hWzRAjf0lqaLDIJTr9hBEqIsu/EShd10OG5vty6VdiVj8eAWVymaGIBrluPm1hqUpBPYHCfwCfmDe6z+1rtidh5UB5CdMl2z25ww2nLFJOk+MhJilnz3dD3grPneKY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097973; c=relaxed/simple;
	bh=JAC4N7kXKK3pnjvfn/+uPUJ4VSIRwkWi681l7P20D6I=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Q4d/EqYrSFynQ83TzwohbsW4Ret6YkLluBTUu4hTiiP6gvAsq86rKvObW/kkx6gUCTI6gOSytPH+CF8lUGnqhpViQ+YEohw1BJizpFkEf337pI//xWXR7aTzniJu01vVvZaALLbrnPXLZadpCqMk/sv7zPpWKihOL3UD3Lk0qsw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=KAiQjva1; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="KAiQjva1"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-39127512371so2030582f8f.0;
        Sat, 15 Mar 2025 21:06:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097970; x=1742702770;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bm/TjLVsOIQb5qeMDxohaetZDngVOOW8hl6xb9zJkyk=;
        b=KAiQjva1Q73G11teLexaRECthUojBwqOfHqC9XJ2pJOCJkqA+m8OGPCQtffaYyFV1W
         9MT1ApUHsgE4NfS4zZPzJkdznDHTscfG7zkF8KlkGFSC28E/ad8a1WLyOCgdvgGrlZvy
         dt5eVyGLBrPcHS+eTbNLRbLrjoBuCKsvP3eAf29sRrVw+NRteSwEr6j4UBz9eiMxPwGw
         lL758Q1wizhlrPI6vsBDDVmFFZQm491IEVIaOgFercb+JDof4HLngPwGqwxuY6F17U56
         n5wd0epKO8MGLkHV2jW3SrcrTQwznxjQdaISSbnJG/ci2E9f5lAzKN4PVvO+WipyDboR
         DikQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097970; x=1742702770;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bm/TjLVsOIQb5qeMDxohaetZDngVOOW8hl6xb9zJkyk=;
        b=gJcJWGMwQRpcisaHszV3alP6eVTGSU6AfhoCbfkqdOe3ck+x5A4hWF2SmUD+VVCNCo
         nyN5HNfvexcJm8viEm4ItzTChOmJT+wVcHLWxm2E/esKiieuRBW2EzmH4jURrJ47faQP
         qIiERHYBTQ+jI+6Zur8TV8Wb9N7syCtGyhSr47fBkNtsfWNVz9rS3B3cX1RBzBNeJxw4
         8Nq9YNwuaoXa0S1Amk/dpBRtEeqmNgF3XOGXRnivL0s4Rw39lHEp/1sNf0bLilLQLVg9
         rN7CBKex0BPLWXBVuf/c6Qo9nns0/sy675S5B5CAho2CL15xQZQvx9BeWSVDvB57CKPz
         2Asg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUhuVxccmnklnc6qWKjfS3xeKvulc5xaOznXqRU1GpEH53vxSfwJ6prpdOsCWrvA7bRfVJd0zKVCqL3yfM=@vger.kernel.org
X-Gm-Message-State: AOJu0Yx9VSj4fJy0by8/6eRaNWDWFcSg8DABxjKBJ+RiEVTf5jagEio1
	PBeIfNxogz3Aiy7anergV46OSSKl64VhPUFWgE0YPYa2e03AGc0FwKdAyr7epVo=
X-Gm-Gg: ASbGncv0+Woe6RChK+cCFDLDsjkOCTjnKPJVQa9OPFgHi2m/CpsnSjZlAbBcvxSTx1R
	hAsP94CV6g5CJXqd1VhadrESmCniQIf0UEf1VWXo2s4+DbfTJyNZrkJ7S/N6mirG+dwV7gZuNeh
	iDfCfBxSNAGFzMXXu+mPvf7acIh9G+uBl1BLl3xYw6Ye5ZneybpT/KQ1Q4YULGqNVQEFDRNGB7Z
	0o/R9rumQILvJPVoyXdaHvJFS7suZZRIHRe7ReUZ65/fYgpLZ3HzFKfd6D9Rv80VqwSzCNWSYWa
	6tgI++IQag5Pn2AWlIpELChKhZhDAbUHeZQ=
X-Google-Smtp-Source: 
 AGHT+IGURbU24AaTjAxqIKJjqCn2mEjZeMLGKyXVZQ1FJ1WehOwHXfyeGE85aKEnu+FeFGs/jt2rrQ==
X-Received: by 2002:adf:a39b:0:b0:397:5de8:6937 with SMTP id
 ffacd0b85a97d-3975de869d5mr7109406f8f.41.1742097970041;
        Sat, 15 Mar 2025 21:06:10 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395c82c2690sm10726584f8f.25.2025.03.15.21.06.09
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:09 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 21/25] bpf: Convert lpm_trie.c to rqspinlock
Date: Sat, 15 Mar 2025 21:05:37 -0700
Message-ID: <20250316040541.108729-22-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3875; h=from:subject;
 bh=JAC4N7kXKK3pnjvfn/+uPUJ4VSIRwkWi681l7P20D6I=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3ePndVFVpKoLfCEr0LeaQTth201gHenOAJqnlp
 4yxo4/+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyqETD/
 wPY/kdxYtWMwPuTaaDUV/B3x37peOYAWOdkQcKpbuywBe0gInjoXHOEs+9kkyNps+nUXY3V1aFV1sh
 HCKWqCMPO7iBeiB4f45fMwvNewTenIvKf+TNJluOSt/v1CzNEil4pBSqsDlCXmZr8B7JXHb796aCm+
 5WfEq7kYFsgD21IEcJQNF4uTNry5R1ceqOAZvdd4y1NmNljt/7G8QS0IJv3zlu/0+BIunPtpqFbpVP
 5z55MtMIE1CGFkAHVq48XJVIB9TfWzMIoyD6LKaRKQbrABVSTvT2LTCRcJkvJFKXpEKGbuYcmjo0pK
 O1yf/FMseMc11MT7/8ZDm/ezGZao7FjQfCb3U94APSqklR8acBNVZlQeLBwTSY1KIAnQu5C4zjoEme
 yPoSD0GDwU1vuNhQZvVvL5JSlqw6LXR5JgtO2hm1HOvelVuwOon0Mimv8uG/tiH9K7vPJr83ZdKO6X
 G4BmtnD77NsUAbSGZsNDHKg0c8/kyezVplqtG2Su21wBajRyeNwORZRBpxsxuP6d6oi3H2Fw43nklC
 +4rCMj1HNzhTA2XLuXrVuqohGykMqtcQZZNUzk4T6xxGMDuQYGV53FMQZWK9zFKZckR/KT1y/gf3zD
 rVdO2lI+Ki076kbGMvCDeNlQEq7beSPEriiu/ij7C2wMIRHqnyK28WWFR79w==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert all LPM trie usage of raw_spinlock to rqspinlock.

Note that rcu_dereference_protected in trie_delete_elem is switched over
to plain rcu_dereference, the RCU read lock should be held from BPF
program side or eBPF syscall path, and the trie->lock is just acquired
before the dereference. It is not clear the reason the protected variant
was used from the commit history, but the above reasoning makes sense so
switch over.

Closes: https://lore.kernel.org/lkml/000000000000adb08b061413919e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/lpm_trie.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e8a772e64324..be66d7e520e0 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -15,6 +15,7 @@
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
+#include <asm/rqspinlock.h>
 #include <linux/bpf_mem_alloc.h>
 
 /* Intermediate node */
@@ -36,7 +37,7 @@ struct lpm_trie {
 	size_t				n_entries;
 	size_t				max_prefixlen;
 	size_t				data_size;
-	raw_spinlock_t			lock;
+	rqspinlock_t			lock;
 };
 
 /* This trie implements a longest prefix match algorithm that can be used to
@@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map,
 	if (!new_node)
 		return -ENOMEM;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		goto out_free;
 
 	new_node->prefixlen = key->prefixlen;
 	RCU_INIT_POINTER(new_node->child[0], NULL);
@@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference_protected(*slot,
-					lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*slot))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map,
 	rcu_assign_pointer(*slot, im_node);
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
-
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
+out_free:
 	if (ret)
 		bpf_mem_cache_free(&trie->ma, new_node);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	if (key->prefixlen > trie->max_prefixlen)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		return ret;
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of the path we traverse.  We will need to know the node
@@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference_protected(
-		       *trim, lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*trim))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	free_node = node;
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
 	bpf_mem_cache_free_rcu(&trie->ma, free_parent);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 			  offsetof(struct bpf_lpm_trie_key_u8, data);
 	trie->max_prefixlen = trie->data_size * 8;
 
-	raw_spin_lock_init(&trie->lock);
+	raw_res_spin_lock_init(&trie->lock);
 
 	/* Allocate intermediate and leaf nodes from the same allocator */
 	leaf_size = sizeof(struct lpm_trie_node) + trie->data_size +

From patchwork Sun Mar 16 04:05:38 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018295
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D39A11A3155;
	Sun, 16 Mar 2025 04:06:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097975; cv=none;
 b=fN44B9cy4AZB3iD1qd76hiArV6Smjothd393j5j/q8SG48p7bxjnx1heqVQ4/QrMeFR6Ol6ENlitXx1MPtRWHyQ7/QipDA9GZwBaiiXPGSY/spT/sg9KqqdPmPqiOw1MnDKVP3KV+XeU4lCTpOxMyjXvqbyeJVo8Tk77sJ+gbls=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097975; c=relaxed/simple;
	bh=5BKl9hYgLyro6F1bzOCMfYjd/K5FyedLnNqGmCHan+A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=OMnkx1xbne+3h8HzwebIUmV2g8mx+GSNcgVgK31AxHkG5Ydnzjy80HMT8SD24aj6XgtYASxa58E+GX4vL2uRDDB8uJgPAMKCSGtzCQbO2h9kUjlVZLEYtTmSCo4ctEFxTLc2AoOPVbwhH6wjwuPBp834aESNy+3EHj8AgMj+tLU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=PI6DVTXn; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="PI6DVTXn"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43cf680d351so5110285e9.0;
        Sat, 15 Mar 2025 21:06:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097971; x=1742702771;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=1VnxgBQ3rAJeGHFIGyB+ChpGIi/1tYmBGxWhiMsjK7I=;
        b=PI6DVTXni17Mtb76ppSFIVtCY//VZCA+sTwshIHKXkxfYzOq4MJyBxY1rS74toXHar
         4gQdYSbKv1PvhbYwUjMHJFQVWEqld2aazM/vtL8iPAZlNT4UtYBYAZt0BFIqIrqUAlZ0
         aOPVET2nOniV6Ce8P96w7ceaQ0MICfO8xBup+I4ZCNH1ULu96GPUkoSyMSgvxwJ5rTBN
         u1cLs1XjB15v1yXSO9pThoBbYftMeH6vW+duJPENQ3fMzWdVEh+IOd6bkSpFThDg8z6H
         WI3qnS7O0tgohcieJL7lcXvfPpC1ELUjb+4AmngHKWEkcCsks4iDjY6Pxx8VcqEg8pFl
         yDzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097971; x=1742702771;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=1VnxgBQ3rAJeGHFIGyB+ChpGIi/1tYmBGxWhiMsjK7I=;
        b=VyTNYhtEDyrm3D/AnHQ5SiqOF65LnwX+PVIMxaPJYIAxr42IGyGGo9a5xLf0IFHt8Y
         w5MLrLLfvTt4ZWaF98veewA5OMNU0+9Dan77Gtx2VzdQg427Hjhf9vYe0z3hA61ffFb5
         9pyxLsLlvnw0M7V+dTc6fNW/u9uoLxjDlhMCQSkUKYdupyw6wOKPS06C72W+WUaIvV+2
         HJj/fp507PEwirNO6UlnD2FjX2mzG9wGF3dZGDR5z9Rkkj1/Ufqthtwlq+yfVkc4r/HQ
         oAPyzZLUvYTU8ESxVRPghJ4GdRPm2+4S5i/dVDF+FqCHNMw3i18EcZ+vkY7ecPKujSdp
         4qIg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWgUhUbdHGPSgOgI6rCH8goQrDRRO3HPTz7TOofGc1XJXUEJmoWhiwHvhUeODfNLDulZX/nCnYERwS+F3A=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz88cat0CruQj9AKdZR6FpSO4QWQ6KeGpeHjVPkVBS6xcuZoXkc
	p7dSvNXhgOZZ9cVt06wkwBFwK1JIOIicIas0sXGzM5wsNORksAWFJJgDjVr5oUA=
X-Gm-Gg: ASbGnctf+kgunPJdWLkTUDyieBqMqcFo0J7CYL3gQ9Fi+3xKKwckSDY1jKvMj2hMs6l
	pLlhcx3MeDeTFzztbUA8tYAMDJFRL4QrbtCk0IQPGYOl4r6Vc8hvRcF25TvXsEq+AdHtObqpOdn
	VPMa96IS5vXjE3HcvSx4zwWR/qbMDdCwYQz3WGoJpYmaUPf6/Memt6rLTw0b3lWwQ6dqGILVSId
	O11dOmvY2TxaM14HiLWnR07IZkbSLBeEjvjAPR4BuMWaCuHphpBDHQwB6Ro8eqh8NcLYDW2sUpP
	kjJj287FnBHY2/mR1zZEs0MZvYL1e/+wmJ4xQh/PS6h2Eg==
X-Google-Smtp-Source: 
 AGHT+IHykLJ5NvwHyRbYtDv/1bZEgjZY12X6M0CYxwI6OW/IzydFKfy0Lmd5AEeKjh0K2mfT3jL26Q==
X-Received: by 2002:a7b:c2a9:0:b0:43b:bb72:1dce with SMTP id
 5b1f17b1804b1-43d1806bfc6mr111911795e9.5.1742097971213;
        Sat, 15 Mar 2025 21:06:11 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-395cb7eb92csm11124023f8f.91.2025.03.15.21.06.10
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:10 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 22/25] bpf: Introduce rqspinlock kfuncs
Date: Sat, 15 Mar 2025 21:05:38 -0700
Message-ID: <20250316040541.108729-23-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5059; h=from:subject;
 bh=5BKl9hYgLyro6F1bzOCMfYjd/K5FyedLnNqGmCHan+A=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3e+fudrbbc/TgS+/ixztDuZ5UbJUeXYBElGNak
 SEdzC0OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8Ryqz4D/
 oCflyjgtF5/HlKZj4F2D61xd+E+vDVH2+zf1tMkPdHH8lyU299BGDRQqY2WtOTwcBLUY9OqorBhQ8H
 rjwEGY2fs77qokXPHkKwGo9ZCV9KoFXcx2t1akp8zymB9tkkxVrbMvKrD4LJLkmJp4+0Gk0P4dmTT6
 uL5NE4sFMWsPdIdSevyjlGvL/98JDGWrIzfqzVSbHA9M6AJbeyaL5upuniadERgB4b7/UNXbBeSzt4
 ZLFS9zpB8NYwZecwzgOydM6/jWOQPjeh0pUyf8lwSq17ZzhYsHp2945rJhtVwSxt82T3WDRPib5MTU
 7t4sgoBzUKN9xFg23oq0eZbuWRdF0BXps6RORdd5fgrSHtEYt5KHaSYff51NxW87b+eNlFzdKM57pA
 Pez01gQmSKdydkbA0jpu+VYH962Jkoigu2ho3Khwrr2JuWqOcAz+5Y/SnXDhbU8W/UpTH3SK2Wy4kF
 QMpW9/D0CW2bAdIgChF/jU9dRdCKMMLFH+OpADrQQfn4zvU7cWdzA/1BPNh0bLBYEX7K2va7cXSA+g
 uHGz5ivgqNl9TakLfq+7exi7BFGrnYDGJZ9qjvtAdu3EDnpEX3iX5MFWcOfOHBmWnOTiGMi5x2ks/7
 zQL3CJIhr4Uc+LeK+wpPS9WJnwsrk8CuEMSTyIC9XeeQUQWEeb4pfptwOM1A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock,
and their irqsave/irqrestore variants, which wrap the rqspinlock APIs.
bpf_res_spin_lock returns a conditional result, depending on whether the
lock was acquired (NULL is returned when lock acquisition succeeds,
non-NULL upon failure). The memory pointed to by the returned pointer
upon failure can be dereferenced after the NULL check to obtain the
error code.

Instead of using the old bpf_spin_lock type, introduce a new type with
the same layout, and the same alignment, but a different name to avoid
type confusion.

Preemption is disabled upon successful lock acquisition, however IRQs
are not. Special kfuncs can be introduced later to allow disabling IRQs
when taking a spin lock. Resilient locks are safe against AA deadlocks,
hence not disabling IRQs currently does not allow violation of kernel
safety.

__irq_flag annotation is used to accept IRQ flags for the IRQ-variants,
with the same semantics as existing bpf_local_irq_{save, restore}.

These kfuncs will require additional verifier-side support in subsequent
commits, to allow programs to hold multiple locks at the same time.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  7 +++
 include/linux/bpf.h              |  1 +
 kernel/bpf/rqspinlock.c          | 78 ++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 23abd0b8d0f9..6d4244d643df 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -23,6 +23,13 @@ struct rqspinlock {
 	};
 };
 
+/* Even though this is same as struct rqspinlock, we need to emit a distinct
+ * type in BTF for BPF programs.
+ */
+struct bpf_res_spin_lock {
+	u32 val;
+};
+
 struct qspinlock;
 #ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0d7b70124d81..a6bc687d6300 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <asm/rqspinlock.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index ad0fc35c647e..cf417a736559 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -15,6 +15,8 @@
 
 #include <linux/smp.h>
 #include <linux/bug.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -690,3 +692,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
 
 #endif /* CONFIG_QUEUED_SPINLOCKS */
+
+__bpf_kfunc_start_defs();
+
+#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
+__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock));
+	BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock));
+
+	preempt_disable();
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock)
+{
+	res_spin_unlock((rqspinlock_t *)lock);
+	preempt_enable();
+}
+
+__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	local_irq_save(flags);
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		local_irq_restore(flags);
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	*ptr = flags;
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	res_spin_unlock((rqspinlock_t *)lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(rqspinlock_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock)
+BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore)
+BTF_KFUNCS_END(rqspinlock_kfunc_ids)
+
+static const struct btf_kfunc_id_set rqspinlock_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &rqspinlock_kfunc_ids,
+};
+
+static __init int rqspinlock_register_kfuncs(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set);
+}
+late_initcall(rqspinlock_register_kfuncs);

From patchwork Sun Mar 16 04:05:39 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018297
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B3F81A38E4;
	Sun, 16 Mar 2025 04:06:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097978; cv=none;
 b=cwR6W/vXw8xLZP2SiKIN43HWegE6oqmMuWFCoUJ5pr4ZNPbB8paWWQWbyeszDqZznY8zdOzyXJkWkOPCKSaam+oLeOKKpdWNN0ATPn6FeofN1IaAAtmgr79qI4TbHiV+10YiMjj1Xi1deYgHZK+WJbsHBoa/+qSAhD3ZC7+mLGM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097978; c=relaxed/simple;
	bh=TKYj1e85LfDF4On+2xNv88oL+YiC+nakNwpx1a84p3o=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=gPk8n2I2+t2QMkikPO5YAKV4f4K9qnXXofxdIBe5hv7xgKcxC/QD2Ne4ZA5a30LOhPAQEmal1L6MUPjmYT1ahr+rVxw0r+K/JdAWu8jYcRVLtNcjPRnKTjDIwurt9f1kbiGBQYgu/d7Ebqc/VKU5MV4RK91vC9NVhDLCwJ4wZjo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=D4bBcAho; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="D4bBcAho"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-39143200ddaso2092512f8f.1;
        Sat, 15 Mar 2025 21:06:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097973; x=1742702773;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=jlan5+8nB6QkCIN4GJXMu7h0ZNo65385jC1+uYY7jKU=;
        b=D4bBcAhoIhUn3gd96EipenZH+86u8y1UvBA2BWa4mQ/G0zGXmICeprdk6jpLckGrSo
         +MJQH3YWDnNGLSUA8nFN4yi5ZiCusc0O+kZMmgG9S5VIOlzQV9sGKVkpoKcxyFx7mSw+
         fZjtevsYBjERrrSkLR7zBehIhEgfqV6mcZykmGrXOD8CrOdNyGmT9h7GrhFjIjivZAdX
         V08fCBif8e7qs0MW5mHVH7zahuDgTceZwvJRYWZyPJDK2iYaLbPaQ1yG9eeL2L/pkFNO
         Wcp+bNyaM5+02u2b8YK57xx4Z2bi8dggzX+Xqd39pZD1NbfFGBpkiz7VU6l5ZNf5WWoU
         /Ofg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097973; x=1742702773;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=jlan5+8nB6QkCIN4GJXMu7h0ZNo65385jC1+uYY7jKU=;
        b=GHmbvBZQcZm8AYBm8wX4zO/wlwOH8qhIhtk3kAft0ONsw+lamDDw8muqrwrjW5aaNj
         y8vgMxO/3T/hScSgrjDe4sjNFmYb6LDk77PDC9T96VpPGoz5hQx+d+jGMdnxX9iQUoYd
         w9mjAmMu2M0dIl92kga3p2biE3KTE7Q6Z1rBDJxZOq+qem00oOA6LeFsfO+KypIs0Hyn
         WlrkRuSE0UgCoOdihUJdcOydmPTCj9CWpor1/N0nOcKNbx34xo/BdpsxPJcsmZP+LLeG
         lbHK8j/S+c75fFdapCvVhdjPKD1vmEUnYjPXqQqUhx98k7S15s8aOHfpAkKdUqFRvBcr
         oC6w==
X-Forwarded-Encrypted: i=1;
 AJvYcCXDeE/iOB0Av1ivOuHLuN8Xw9PAlALyzEuVyfjHxM6tZCzL4PNK0gQi+oxhBr0KDz8JRZJlVW8hU1Sz8yM=@vger.kernel.org
X-Gm-Message-State: AOJu0YyYfvqZHtx7JAvD8AqNTL1OdeaqnJDh8pb+8GzBa7zMQY24LrdC
	5IbQGEzihRsUK8me1ix2WA8X6vJuclw+PBR3xnkxJ9X5HwMlr65jyW0kjMQjZY4=
X-Gm-Gg: ASbGncvzskKHL0ovIfOtmr5x7PxujAnlzJSJzX4eij+h1IKYeDK5KKgFLSSgFDrZFA8
	MUDBPj4PRhk+S8vAgsivZIUwXwaIt/kt8e/I1KCAEAZ6S7GwvDY6ti8xq8AYN21H3X2f+0hkQR6
	tqGxBpUFckuFjwmxyCcXTgTnIyi+0LJjARpBKrL1zKenfpJJG1vl0wwCY+TqcrRVb0I9MMXPX+w
	UEEmQFr5kBj1tkxHVUTxRDsQRQIVBCf0sq7x53r09QLQQlsTqg0JKMGDsEJclqygxcu5PBYBXYi
	K9cH0r+cf6qxHMP9wiQrRSfQP2fgYn7skA==
X-Google-Smtp-Source: 
 AGHT+IHDSQIlwm4Q3qq0hAantsqR4ybygRtjWMix0xCMm33UfTOrzlQ66jTk8DoOk1JpkKBXfsyQsw==
X-Received: by 2002:a05:6000:1567:b0:38f:2b77:a9f3 with SMTP id
 ffacd0b85a97d-3971f12c847mr8060318f8f.43.1742097972749;
        Sat, 15 Mar 2025 21:06:12 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:a::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-3974d771160sm7104025f8f.19.2025.03.15.21.06.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:11 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Eduard Zingerman <eddyz87@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 23/25] bpf: Implement verifier support for
 rqspinlock
Date: Sat, 15 Mar 2025 21:05:39 -0700
Message-ID: <20250316040541.108729-24-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=28432; h=from:subject;
 bh=TKYj1e85LfDF4On+2xNv88oL+YiC+nakNwpx1a84p3o=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3e0js0FKGxFyeBABFuS0o37z2OuJlkez0uq7nl
 JIfbb/iJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RylgaD/
 4gq3/KWE9tWIFsZftxkV2ziOGJF2+y4efQgWE6NGbCPAZl+2BQ85qW1ZHSSSWFdPXrzrIFfVl6ZwHB
 LlEYaOCZbFGhOgKi/b+iieH6Aoc4KNpyDf1mXEAxn7pihQ0+2ZF8eRz6O66TRN8haCppKvOZnEzrki
 osg6NktnS9tCmv5IiQMPqqrqD4cPaGZR7Y6UNn/i9v74BmR6vXqlxr84Lkc3uBGJf+LOdbYR3qacdw
 cGppfL4fio0YHfLel60vHLOkjf2yeNItsPjX0QI4eqaUUXfqLfRW8LkdCreXy8RBTIbukWvWQpxEvX
 F+SiXwBIh9saPuHgaV318ANQn0N8d5xtVSEDbKRe5kW/uIneCbMaueU/BrlmqHxuhNoxpWY1WqvygW
 Tog1lnvdlH2SsqH8TjEOqhFyg45aYOx/5Nt8tDxcpPNJMdViuQ/oeoYibj+FpMJELDoqVPmRE9wEGP
 5Y6Q67KPCin96clyIvx1iafakxonUkclXA3rOQRXZjTFGi7W8PLNd04vLFj3aacP7xnjUN9zCqyJrQ
 pWJzQPzaBQNyN9hlH8AljtP30WR+zwopNhio3xHnde4DwSa+Qpsfith29As5MZDTZYRKem0Dq35YAx
 QVfi/9UIyJdC0fxZBK4aByFH15DKwg8pTPKIgcq9dM8uYQHQ5RTCntYjIDNQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce verifier-side support for rqspinlock kfuncs. The first step is
allowing bpf_res_spin_lock type to be defined in map values and
allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
field to recognize and validate.

Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
one of them (and at most one of them per-object, like before) must be
present. The bpf_res_spin_lock can also be used to protect objects that
require lock protection for their kfuncs, like BPF rbtree and linked
list.

The verifier plumbing to simulate success and failure cases when calling
the kfuncs is done by pushing a new verifier state to the verifier state
stack which will verify the failure case upon calling the kfunc. The
path where success is indicated creates all lock reference state and IRQ
state (if necessary for irqsave variants). In the case of failure, the
state clears the registers r0-r5, sets the return value, and skips kfunc
processing, proceeding to the next instruction.

When marking the return value for success case, the value is marked as
0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
the verifier never traverses such branches for success cases, and would
be aware that the lock is not held in such cases.

We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
are invoked. We introduce a kfunc_class state to avoid mixing lock
irqrestore kfuncs with IRQ state created by bpf_local_irq_save.

With all this infrastructure, these kfuncs become usable in programs
while satisfying all safety properties required by the kernel.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |   9 ++
 include/linux/bpf_verifier.h |  16 ++-
 kernel/bpf/btf.c             |  26 ++++-
 kernel/bpf/syscall.c         |   6 +-
 kernel/bpf/verifier.c        | 219 ++++++++++++++++++++++++++++-------
 5 files changed, 231 insertions(+), 45 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a6bc687d6300..c59384f62da0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -205,6 +205,7 @@ enum btf_field_type {
 	BPF_REFCOUNT   = (1 << 9),
 	BPF_WORKQUEUE  = (1 << 10),
 	BPF_UPTR       = (1 << 11),
+	BPF_RES_SPIN_LOCK = (1 << 12),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -240,6 +241,7 @@ struct btf_record {
 	u32 cnt;
 	u32 field_mask;
 	int spin_lock_off;
+	int res_spin_lock_off;
 	int timer_off;
 	int wq_off;
 	int refcount_off;
@@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return "bpf_spin_lock";
+	case BPF_RES_SPIN_LOCK:
+		return "bpf_res_spin_lock";
 	case BPF_TIMER:
 		return "bpf_timer";
 	case BPF_WORKQUEUE:
@@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return sizeof(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return sizeof(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return sizeof(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return __alignof__(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return __alignof__(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return __alignof__(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_RB_ROOT:
 		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_KPTR_UNREF:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d6cfc4ee6820..bc073a48aed9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -115,6 +115,14 @@ struct bpf_reg_state {
 			int depth:30;
 		} iter;
 
+		/* For irq stack slots */
+		struct {
+			enum {
+				IRQ_NATIVE_KFUNC,
+				IRQ_LOCK_KFUNC,
+			} kfunc_class;
+		} irq;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -255,9 +263,11 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR	= 1,
-		REF_TYPE_IRQ	= 2,
-		REF_TYPE_LOCK	= 3,
+		REF_TYPE_PTR		= (1 << 1),
+		REF_TYPE_IRQ		= (1 << 2),
+		REF_TYPE_LOCK		= (1 << 3),
+		REF_TYPE_RES_LOCK 	= (1 << 4),
+		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 519e3f5e9c10..f7a2bfb0c11a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3481,6 +3481,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 			goto end;
 		}
 	}
+	if (field_mask & BPF_RES_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_res_spin_lock")) {
+			if (*seen_mask & BPF_RES_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_RES_SPIN_LOCK;
+			type = BPF_RES_SPIN_LOCK;
+			goto end;
+		}
+	}
 	if (field_mask & BPF_TIMER) {
 		if (!strcmp(name, "bpf_timer")) {
 			if (*seen_mask & BPF_TIMER)
@@ -3659,6 +3668,7 @@ static int btf_find_field_one(const struct btf *btf,
 
 	switch (field_type) {
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_LIST_NODE:
@@ -3952,6 +3962,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		return ERR_PTR(-ENOMEM);
 
 	rec->spin_lock_off = -EINVAL;
+	rec->res_spin_lock_off = -EINVAL;
 	rec->timer_off = -EINVAL;
 	rec->wq_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
@@ -3979,6 +3990,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			/* Cache offset for faster lookup at runtime */
 			rec->spin_lock_off = rec->fields[i].offset;
 			break;
+		case BPF_RES_SPIN_LOCK:
+			WARN_ON_ONCE(rec->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			rec->res_spin_lock_off = rec->fields[i].offset;
+			break;
 		case BPF_TIMER:
 			WARN_ON_ONCE(rec->timer_off >= 0);
 			/* Cache offset for faster lookup at runtime */
@@ -4022,9 +4038,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
+	if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	/* bpf_{list_head, rb_node} require bpf_spin_lock */
 	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
-	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
+	     btf_record_has_field(rec, BPF_RB_ROOT)) &&
+		 (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -5637,7 +5659,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
 						  BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT |
 						  BPF_KPTR, t->size);
 		/* The record cannot be unset, treat it as an error if so */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6a8f20ee2851..dba2628fe9a5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 			break;
 		case BPF_TIMER:
 			bpf_timer_cancel_and_free(field_ptr);
@@ -1212,7 +1215,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
@@ -1231,6 +1234,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case 0:
 				continue;
 			case BPF_SPIN_LOCK:
+			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3303a3605ee8..29121ad32a89 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
+	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1155,7 +1155,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id);
 
 static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 				     struct bpf_kfunc_call_arg_meta *meta,
-				     struct bpf_reg_state *reg, int insn_idx)
+				     struct bpf_reg_state *reg, int insn_idx,
+				     int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1177,6 +1178,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
 	st->live |= REG_LIVE_WRITTEN;
 	st->ref_obj_id = id;
+	st->irq.kfunc_class = kfunc_class;
 
 	for (i = 0; i < BPF_REG_SIZE; i++)
 		slot->slot_type[i] = STACK_IRQ_FLAG;
@@ -1185,7 +1187,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				      int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1199,6 +1202,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r
 	slot = &state->stack[spi];
 	st = &slot->spilled_ptr;
 
+	if (st->irq.kfunc_class != kfunc_class) {
+		const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+		const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+
+		verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n",
+			flag_kfunc, used_kfunc);
+		return -EINVAL;
+	}
+
 	err = release_irq_state(env->cur_state, st->ref_obj_id);
 	WARN_ON_ONCE(err && err != -EACCES);
 	if (err) {
@@ -1609,7 +1621,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type != type)
+		if (!(s->type & type))
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -8204,6 +8216,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
 	return err;
 }
 
+enum {
+	PROCESS_SPIN_LOCK = (1 << 0),
+	PROCESS_RES_LOCK  = (1 << 1),
+	PROCESS_LOCK_IRQ  = (1 << 2),
+};
+
 /* Implementation details:
  * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL.
  * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL.
@@ -8226,30 +8244,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
-static int process_spin_lock(struct bpf_verifier_env *env, int regno,
-			     bool is_lock)
+static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 {
+	bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK;
+	const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin";
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
+	bool is_irq = flags & PROCESS_LOCK_IRQ;
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
 	struct btf_record *rec;
+	u32 spin_lock_off;
 	int err;
 
 	if (!is_const) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s_lock has to be at the constant offset\n",
+			regno, lock_str);
 		return -EINVAL;
 	}
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		map = reg->map_ptr;
 		if (!map->btf) {
 			verbose(env,
-				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-				map->name);
+				"map '%s' has to have BTF in order to use %s_lock\n",
+				map->name, lock_str);
 			return -EINVAL;
 		}
 	} else {
@@ -8257,36 +8278,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	}
 
 	rec = reg_btf_record(reg);
-	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
-		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
-			map ? map->name : "kptr");
+	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr", lock_str);
 		return -EINVAL;
 	}
-	if (rec->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, rec->spin_lock_off);
+	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
+	if (spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
+			val + reg->off, lock_str, spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
 		else
 			ptr = btf;
 
-		if (cur->active_locks) {
-			verbose(env,
-				"Locking two bpf_spin_locks are not allowed\n");
-			return -EINVAL;
+		if (!is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
+				verbose(env,
+					"Locking two bpf_spin_locks are not allowed\n");
+				return -EINVAL;
+			}
+		} else if (is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, reg->id, ptr)) {
+				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
+				return -EINVAL;
+			}
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
 		}
 	} else {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
@@ -8294,12 +8332,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			ptr = btf;
 
 		if (!cur->active_locks) {
-			verbose(env, "bpf_spin_unlock without taking a lock\n");
+			verbose(env, "%s_unlock without taking a lock\n", lock_str);
 			return -EINVAL;
 		}
 
-		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
-			verbose(env, "bpf_spin_unlock of different lock\n");
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		if (release_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
 		}
 
@@ -9625,11 +9669,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
-			err = process_spin_lock(env, regno, true);
+			err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK);
 			if (err)
 				return err;
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
-			err = process_spin_lock(env, regno, false);
+			err = process_spin_lock(env, regno, 0);
 			if (err)
 				return err;
 		} else {
@@ -11511,7 +11555,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].map_uid = meta.map_uid;
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
 		if (!type_may_be_null(ret_flag) &&
-		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) {
+		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		break;
@@ -11683,10 +11727,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 /* mark_btf_func_reg_size() is used when the reg size is determined by
  * the BTF func_proto's return value size and argument.
  */
-static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
-				   size_t reg_size)
+static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs,
+				     u32 regno, size_t reg_size)
 {
-	struct bpf_reg_state *reg = &cur_regs(env)[regno];
+	struct bpf_reg_state *reg = &regs[regno];
 
 	if (regno == BPF_REG_0) {
 		/* Function return value */
@@ -11704,6 +11748,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
+static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
+				   size_t reg_size)
+{
+	return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size);
+}
+
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_ACQUIRE;
@@ -11841,6 +11891,7 @@ enum {
 	KF_ARG_RB_ROOT_ID,
 	KF_ARG_RB_NODE_ID,
 	KF_ARG_WORKQUEUE_ID,
+	KF_ARG_RES_SPIN_LOCK_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -11850,6 +11901,7 @@ BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
 BTF_ID(struct, bpf_rb_node)
 BTF_ID(struct, bpf_wq)
+BTF_ID(struct, bpf_res_spin_lock)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -11898,6 +11950,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
 }
 
+static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID);
+}
+
 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
 				  const struct btf_param *arg)
 {
@@ -11969,6 +12026,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
+	KF_ARG_PTR_TO_RES_SPIN_LOCK,
 };
 
 enum special_kfunc_type {
@@ -12007,6 +12065,10 @@ enum special_kfunc_type {
 	KF_bpf_iter_num_destroy,
 	KF_bpf_set_dentry_xattr,
 	KF_bpf_remove_dentry_xattr,
+	KF_bpf_res_spin_lock,
+	KF_bpf_res_spin_unlock,
+	KF_bpf_res_spin_lock_irqsave,
+	KF_bpf_res_spin_unlock_irqrestore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -12096,6 +12158,10 @@ BTF_ID(func, bpf_remove_dentry_xattr)
 BTF_ID_UNUSED
 BTF_ID_UNUSED
 #endif
+BTF_ID(func, bpf_res_spin_lock)
+BTF_ID(func, bpf_res_spin_unlock)
+BTF_ID(func, bpf_res_spin_lock_irqsave)
+BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -12189,6 +12255,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_IRQ_FLAG;
 
+	if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RES_SPIN_LOCK;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -12296,13 +12365,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 			     struct bpf_kfunc_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int err, kfunc_class = IRQ_NATIVE_KFUNC;
 	bool irq_save;
-	int err;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) {
 		irq_save = true;
-	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+			kfunc_class = IRQ_LOCK_KFUNC;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] ||
+		   meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) {
 		irq_save = false;
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+			kfunc_class = IRQ_LOCK_KFUNC;
 	} else {
 		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 		return -EFAULT;
@@ -12318,7 +12393,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class);
 		if (err)
 			return err;
 	} else {
@@ -12332,7 +12407,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = unmark_stack_slot_irq_flag(env, reg);
+		err = unmark_stack_slot_irq_flag(env, reg, kfunc_class);
 		if (err)
 			return err;
 	}
@@ -12459,7 +12534,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
+			    id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12495,9 +12571,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl];
 }
 
+static bool is_bpf_res_spin_lock_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore];
+}
+
 static bool kfunc_spin_allowed(u32 btf_id)
 {
-	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id);
+	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) ||
+	       is_bpf_res_spin_lock_kfunc(btf_id);
 }
 
 static bool is_sync_callback_calling_kfunc(u32 btf_id)
@@ -12929,6 +13014,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -13227,6 +13313,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
+		{
+			int flags = PROCESS_RES_LOCK;
+
+			if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d doesn't point to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_bpf_res_spin_lock_kfunc(meta->func_id))
+				return -EFAULT;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+				flags |= PROCESS_SPIN_LOCK;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+				flags |= PROCESS_LOCK_IRQ;
+			ret = process_spin_lock(env, regno, flags);
+			if (ret < 0)
+				return ret;
+			break;
+		}
 		}
 	}
 
@@ -13312,6 +13420,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	insn_aux->is_iter_next = is_iter_next_kfunc(&meta);
 
+	if (!insn->off &&
+	    (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	     insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+		struct bpf_verifier_state *branch;
+		struct bpf_reg_state *regs;
+
+		branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
+		if (!branch) {
+			verbose(env, "failed to push state for failed lock acquisition\n");
+			return -ENOMEM;
+		}
+
+		regs = branch->frame[branch->curframe]->regs;
+
+		/* Clear r0-r5 registers in forked state */
+		for (i = 0; i < CALLER_SAVED_REGS; i++)
+			mark_reg_not_init(env, regs, caller_saved[i]);
+
+		mark_reg_unknown(env, regs, BPF_REG_0);
+		err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1);
+		if (err) {
+			verbose(env, "failed to mark s32 range for retval in forked state for lock\n");
+			return err;
+		}
+		__mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32));
+	}
+
 	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
 		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
 		return -EACCES;
@@ -13482,6 +13617,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+		    meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]))
+			__mark_reg_const_zero(env, &regs[BPF_REG_0]);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
@@ -18417,7 +18555,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 		case STACK_IRQ_FLAG:
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
-			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class)
 				return false;
 			break;
 		case STACK_MISC:
@@ -18461,6 +18600,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
+		case REF_TYPE_RES_LOCK:
+		case REF_TYPE_RES_LOCK_IRQ:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
 				return false;
 			break;
@@ -19746,7 +19887,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
+	if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;

From patchwork Sun Mar 16 04:05:40 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018296
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12D33154433;
	Sun, 16 Mar 2025 04:06:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097978; cv=none;
 b=en9+4nW3TGYUx91EZAbZ93ZwmeouQdhn9tUJLeef1YESRY11ZV2GR0+oC0zLlK3oChLt8tqlh1RQNGnMdCDoZzjQe6brFJdF1Cx50rPkKl8wizvHSKWJiZtz/1XHOOvyUBVKR1dZjHJTAvCnyEjG/5hWNLJZJd9IHxJX731kLvg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097978; c=relaxed/simple;
	bh=O+lyN4oVrvKYSSQbJkMNs01queFf9cpPeUOzO9ERMeU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=jenYxAx8GYjHJrwa2Mbb53Bu+GhhZdgOSLuYXAvzgNpCueLajIoT+sYieKig/t6HpYVvFTyZR6LhHokHv9/Dq5ASgvtiwI2U/d/F5ZF/dPC+3ChAzJcb+FwELwDKt2j0MjrCpAJNp4zTR5QRSh7/MLEQ0Mgx7KyDV6IzyhSoUNM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=AXbd+0ua; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="AXbd+0ua"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43cef035a3bso6727935e9.1;
        Sat, 15 Mar 2025 21:06:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097974; x=1742702774;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Td96SnjkiCjYPsW/BwStDtEmfD+sdWIVQooAZW8vma4=;
        b=AXbd+0uaLFCNLNMpAcdWfWnyt39Uhl6m3WrcVD+twq8mpxmq0nSADrQUP67pBYoFF3
         uaro9lkT9UV5kX+snwWQZCOeuca0mmoXt5A/tnge1hi+Pva7vVTs/m3bNxIxplCy+wQ2
         SYSZBkB9S8VCHBzL/9mghyyw9RSY9ft8bcz4Jis7dTnKongs2PPXzVSX/kOiPt7gNRA9
         ZytyMMce0VRALeslOJ4NJjq03piukmYv3JRCF4U8DFEaQPj7w4FcydEND0DV9bY9EBtT
         tblP79n12B/PJI7/hXQbWN7wggpemvLGaVa8IUyj8T0LIIEAiZI+eQzBslcxIsx4kb6D
         04cQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097974; x=1742702774;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Td96SnjkiCjYPsW/BwStDtEmfD+sdWIVQooAZW8vma4=;
        b=UbO46qN+9+aEeovuwn/GaywxUAq+JKJPjZN+e0rovr/SgTuuFh7FLpSPjbBbtddJdb
         FTQC4o+gyqPCjGgKBb0m/m7FBduIv6ojXhfLer1e682Ef3bHxqnQAergSZA29lYmjrkl
         EjCN5FgggzKZewC1/6fpNi3CHF/x0J4zd6MiEymRHaz39Gxs/kp9kJYciSH7YKSz5tqx
         ngbXCYGQy8w4hXQ/TIUdSAZW91e7V2CvZfYdnA69s3nz8DI26THjpIOMhCF+YDn/MzD4
         G9jm2nhzBpTDNdLEE+x79uBFL5zNkMsOUG4QB67VYgPtm/4V1ioLR8aodMUfkvCclZx8
         Cbig==
X-Forwarded-Encrypted: i=1;
 AJvYcCWaHhqmBAvXPAGQvEK6hZeL1jNTtGIZ9mqpKghbFm46dxp6NrDCqwfamRKiqcywa9svliiPUCqDaC0ub5M=@vger.kernel.org
X-Gm-Message-State: AOJu0YwZvmrwX0wEUgg81m7ar47rqRuUIXpm/fYAsoK+giLq2sKZ9yEL
	nYTR9EdMPMluWDv0BcQdH1ak9XoJ2H24WKoSwiIdfbziNrVEqPdzsT2JBHOsKb0=
X-Gm-Gg: ASbGncvcrVRRqf2683OaMfQkJM+57nHwai5vVD1SYHMXer4U/pm+1ZWSG3uvntbUBOj
	QRVpTrYInsSJzXJQb1Vj/cWgCLP1R2qQGJD/dNaVl7orBjKeR0iY3iEOUEquymjTS6QzLkp4S+a
	nbPcUW4hjUUeZJfL1MRAOTgRpC+63e2pbdkNS1/aOYxlW6yf4QbXeER9wev/+SSDkmL6csuIW+v
	VVyMxUSw2gkFQKd1YHZGfDoUTvmnDie1L185gFtR+gmZctLU8eaZjxxiCnHf5uwkTBtWvdSLvos
	hFbkOVtLSQ7oVZuokrVEo+k3BRKMoZTW93M=
X-Google-Smtp-Source: 
 AGHT+IEe2u4p0LnlngF0w6iSDydFegjZ1jArOy5WfyUzD1CS7GnhmMYyuaPdi7pWrq/MX6Cd47TJfw==
X-Received: by 2002:a05:600c:4fd3:b0:43d:8ea:8d80 with SMTP id
 5b1f17b1804b1-43d1ec9071amr97461285e9.5.1742097973885;
        Sat, 15 Mar 2025 21:06:13 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:70::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fe15488sm68116955e9.16.2025.03.15.21.06.13
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:13 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 24/25] bpf: Maintain FIFO property for rqspinlock
 unlock
Date: Sat, 15 Mar 2025 21:05:40 -0700
Message-ID: <20250316040541.108729-25-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject;
 bh=O+lyN4oVrvKYSSQbJkMNs01queFf9cpPeUOzO9ERMeU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eZlk7c0Ywl88mFKBS1oFXTDuU3/IFonJhUClr
 HKe1WSqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyjP4D/
 4qY2ygrw3rIEgXKa12ik0/oktnV0e/sgD/scWu+14+mWArsWSQ8UUjlTmpeZtHx1UMMimJTVgA8JWn
 ErwnoS51+LKONrfuBOnb8WiSnFeLF2kEEfBXZQhz52hCEM91I4VjacRj/HFnavXb8dOXViHZOtiFYp
 tA8XV+c68vvKXklV2jpo5wQkhg4tl3j7veEwJrpGTnt1ANUxSd/148gUr43wVDaWrSgg0QTfjMnt2k
 goGCGNX40txTa7gZVvoN/8zUCK+ll8jaMBMQ+wXgziu1+Yxk9+wlg2NDfkV+ao2pAFbE2K59WELhhm
 zH40fY8ghmeKSCcSjq49JyGkr5NuuFkIKX9CxCU42wXXArAMCTvksaIBgF7yX6OcFX13Yd45mH2eQD
 ugV1igIdF9xIUeRIsnMvEEbbiDCD3kD8Zq8gI8qaaKFsk/czJpoEggouEQbtcPaJjppZjtejTeiJ60
 lRqpK1fksWewmXvWLmFMlepLnJ23LvW+pPoJ8fBE0oSJuVXPW2+zmqm1tPHFkN7PEBinvfYBocQLPo
 HJmKqN3bK3Ub6nxUL5gU8yHrGZ6D9lv3K2LvmqUCYq8IJ9WdhSpT8wCA4I7oyLhMrvQs/sXiLR24eK
 rPTfZ7DZ+EFlgOEIzpWNR+wfqUw8Eo8o/ueXBPEQAa+d6DeACjG6ZPXCAJmw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Since out-of-order unlocks are unsupported for rqspinlock, and irqsave
variants enforce strict FIFO ordering anyway, make the same change for
normal non-irqsave variants, such that FIFO ordering is enforced.

Two new verifier state fields (active_lock_id, active_lock_ptr) are used
to denote the top of the stack, and prev_id and prev_ptr are ascertained
whenever popping the topmost entry through an unlock.

Take special care to make these fields part of the state comparison in
refsafe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  3 +++
 kernel/bpf/verifier.c        | 33 ++++++++++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bc073a48aed9..9734544b6957 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -268,6 +268,7 @@ struct bpf_reference_state {
 		REF_TYPE_LOCK		= (1 << 3),
 		REF_TYPE_RES_LOCK 	= (1 << 4),
 		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
+		REF_TYPE_LOCK_MASK	= REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
@@ -434,6 +435,8 @@ struct bpf_verifier_state {
 	u32 active_locks;
 	u32 active_preempt_locks;
 	u32 active_irq_id;
+	u32 active_lock_id;
+	void *active_lock_ptr;
 	bool active_rcu_lock;
 
 	bool speculative;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 29121ad32a89..4057081e996f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1428,6 +1428,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
 	dst->active_preempt_locks = src->active_preempt_locks;
 	dst->active_rcu_lock = src->active_rcu_lock;
 	dst->active_irq_id = src->active_irq_id;
+	dst->active_lock_id = src->active_lock_id;
+	dst->active_lock_ptr = src->active_lock_ptr;
 	return 0;
 }
 
@@ -1527,6 +1529,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	s->ptr = ptr;
 
 	state->active_locks++;
+	state->active_lock_id = id;
+	state->active_lock_ptr = ptr;
 	return 0;
 }
 
@@ -1577,16 +1581,24 @@ static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id)
 
 static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
 {
+	void *prev_ptr = NULL;
+	u32 prev_id = 0;
 	int i;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != type)
-			continue;
-		if (state->refs[i].id == id && state->refs[i].ptr == ptr) {
+		if (state->refs[i].type == type && state->refs[i].id == id &&
+		    state->refs[i].ptr == ptr) {
 			release_reference_state(state, i);
 			state->active_locks--;
+			/* Reassign active lock (id, ptr). */
+			state->active_lock_id = prev_id;
+			state->active_lock_ptr = prev_ptr;
 			return 0;
 		}
+		if (state->refs[i].type & REF_TYPE_LOCK_MASK) {
+			prev_id = state->refs[i].id;
+			prev_ptr = state->refs[i].ptr;
+		}
 	}
 	return -EINVAL;
 }
@@ -8342,6 +8354,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 			type = REF_TYPE_RES_LOCK;
 		else
 			type = REF_TYPE_LOCK;
+		if (!find_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
+			return -EINVAL;
+		}
+		if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) {
+			verbose(env, "%s_unlock cannot be out of order\n", lock_str);
+			return -EINVAL;
+		}
 		if (release_lock_state(cur, type, reg->id, ptr)) {
 			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
@@ -12534,8 +12554,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
-			    id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -18591,6 +18610,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
 		return false;
 
+	if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) ||
+	    old->active_lock_ptr != cur->active_lock_ptr)
+		return false;
+
 	for (i = 0; i < old->acquired_refs; i++) {
 		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
 		    old->refs[i].type != cur->refs[i].type)

From patchwork Sun Mar 16 04:05:41 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 14018303
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65CF81A8405;
	Sun, 16 Mar 2025 04:06:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1742097979; cv=none;
 b=UMANXvrDBSAxdwTi/rBdOapMAqXoov7ZMLxSyic5wy2reANvsmLzanCND7tUPBVlC9cy/rxjusqvxhqthrxAB1OjYXDZE1eN7QnTQ8q7ykioSz2gSKG7z592F4cFgKJd45WvEUgLbFSdEdT2Zc/E4UhPONTuIM+T5KlhWqMfLv0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1742097979; c=relaxed/simple;
	bh=WzglPmerkocCQhDWbBighUyfP7AwOFpfdLvye4wcaoE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=U04NiLVFUdzu92yRcnEZJk7qpxtzhQHVddmPhGIm4S/0+nh4pc+Bb0unlR6hfZY+SC0vqp+i/VUp1tF4zdkUDDw36bYeIgdk6KzYKWC9JmwIqTWu00S+S4wTPY83i2kvwQnAZUwqA0de12BwQ0IgMYZUSMkHH34LerSqjVrckEw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=CHypsgx5; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="CHypsgx5"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43cf628cb14so5252755e9.1;
        Sat, 15 Mar 2025 21:06:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1742097975; x=1742702775;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=fqTweZiejAx9T/E+eHkWTTmWDjmG9az608PXOpULayA=;
        b=CHypsgx5AnwiCMKxHf7mj5LJcsMpnutecfe7xQYf34hRa34KeAS+r+nu7yMseASeBZ
         VqbTMZVhMyXKmy51rOQv2H9X3dzu6FwvIqd3yF4LsQYT+DRfFri2uVOIKb00OZntcnzE
         g4jPB3NqIy11nZKTVDVUJVn1cXKdGEcE0AhtEH4J9+YJ8Fak2YbRIYLPoCRzDVTkX753
         JPoIf8hA8SbNsCWMUjm58FZd1Hg+GnO3a72Ws6zl6PRmjDfBMmWaZIXHX/cRWSnn1Q4M
         qkbzuyExjuzzbPVTbtULaPQdXTEh2F9O5qOQX/GqGGAvwc91nNmwd3QO3bCzh6nR3LIO
         CxDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742097975; x=1742702775;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=fqTweZiejAx9T/E+eHkWTTmWDjmG9az608PXOpULayA=;
        b=FousrDKdG7cDZqoAeBMKRut7OxDHixRXIjvB8Q9qILDAyqhHSq1G++QaZKq60OYGQ7
         ZB7c4u7ia7ZxFqnWrs5GbFF7tl9Ss3HbOc9kp36/kITMDm/B0LBTuqCa5UJ8JbOJ9cdL
         jEvwbqSOzv15LjgcymCwRuoXDIDbhNiiOjkNAdrDZM4pBG7BM9z77cVEPXlaI8a32E8f
         +q4ZLWIiUq0fmLI61CBtCCmG9exOGjgwUv8Z4GqJW0CwkXRI5deV8sOzuiVgx6UicsmH
         oQmQ1O4GXcGfOeluR3Cgs4mh1PaY+u2pPGIY5j9pv9MGMH27c6af8oCsQeUS3ORtJWYV
         70og==
X-Forwarded-Encrypted: i=1;
 AJvYcCUyDc/TLKMojGi5uoTg4taqCa+Ygk4Y6y8l+wt5WI1dJPQUMlGiYj0zbPqkXfrNI4cax/pdgyHDZ0hOanU=@vger.kernel.org
X-Gm-Message-State: AOJu0YyGnLuYjrWtVc3R2wN7Tx94awaLKTC8bLHqC3eH1UJvuwI0Sg/N
	4Ryig8vHN5PyYj7Tr2cmP08nHFI6jhmbw+WUBCgJ+4Xwh/9IUSAR2V4VvsQFoSQ=
X-Gm-Gg: ASbGncuxd+fvxQonGmbS2HEfBjNSKMCTm7A/7xlFotYhkxEXgLuZtxBnb1RJkowBtCW
	JwNE1D3xXL/HqYt86LYAl516RejJ7ELNnhyHsGsK7gTPydS0XqGiM2e4wrzXgud7Odo4j/ryaPU
	tJ1VuKupCZrkbQThRyRILxzPbdvU4hBkGCNWjNjTAt2hyKPzsHwSDfHWaM8RwiVXzSbuAPExZRE
	cIVIejrWMuABje6Ms/TGzr0tcfcNjYJPXSuUMY/7m9SuPL17tGnexfj7wTGsvVqGAHQ3/YTIV5e
	l8OC5FPk1JCVkq6qdurvhdBU+sLvngKpNQ==
X-Google-Smtp-Source: 
 AGHT+IFeh67jTgD5TP0qheSUJfU7LLF64eXr0kHndM6htg+7j04x+U4+nQOzqvFPpjLVDRYSpnbm9w==
X-Received: by 2002:a05:600c:2e49:b0:439:91c7:895a with SMTP id
 5b1f17b1804b1-43d18077785mr127539315e9.7.1742097975149;
        Sat, 15 Mar 2025 21:06:15 -0700 (PDT)
Received: from localhost ([2a03:2880:31ff:9::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43d1fdda29esm67800095e9.7.2025.03.15.21.06.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 15 Mar 2025 21:06:14 -0700 (PDT)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v4 25/25] selftests/bpf: Add tests for rqspinlock
Date: Sat, 15 Mar 2025 21:05:41 -0700
Message-ID: <20250316040541.108729-26-memxor@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250316040541.108729-1-memxor@gmail.com>
References: <20250316040541.108729-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=15835; h=from:subject;
 bh=WzglPmerkocCQhDWbBighUyfP7AwOFpfdLvye4wcaoE=;
 b=owEBbAKT/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3f2c4rqqrAQ0kVuTGJPhKGv2EW1Y8So7OlkvKH
 pijdeFmJAjIEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3wAKCRBM4MiGSL8RyhbbD/
 UUdDj3C6OQQLLspCJ/FjFqMb7Rh+TPjsp+ezYfCepntai3KXcaf5EXMkD2hxkq5UXjWrPDc0/qRlWe
 g6SqGPxSDkawyNrLqqJpettdZ9Otv3dfHFcqnUVbvcXce+/Dv0KK8Z0LHNXMVSLNQosrLZZIcUYlK8
 q9tIL79uS9UZsjCswHctVclK3kG6/j/pbYNePRmaOGvS28U4vYwkPCR4ukGUTpzSGhCRQyKQ9mfots
 AD4s5cwsLVJD6mn6JW+T8NQsw5fc4cL+jcmEGw4eFZKfhEtRtSq4pDO8W9npT0aOUuL7n2tR6LUqA9
 PCLIgaddMWt+ZBb7ZAl9a4R0eVngKUmjipW2nVbuPrXlPprmGukFj7ffLLfzBiXeEar8Z+qNs7m+k1
 3TNBqf/k6JRgsBDMD+KAV17zrogSxgoabbMqJr/Qks7HK3qFuuT/rcVvPct457i9heREK/rh+iPOSx
 geTMgx8BdnqZP8XlUbLhCrhS5F+aIEhGOZDRvkC2VNavkKhnKCTmkgwWnXwWul6mxGQQyG5ogs40ic
 Y5lSr36ZMAApoVgCM1E40+g4hc4Va8xlVF2bp0ST/A+GOJVdV5WycxpHNj/qPaS0F//8vnPdY7EVE/
 zBJ5rHbNyk57EKt+zdORu09tY/48U3/03l0OWSyN6pXSWzI3uoRnsPzxWV
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  |  98 +++++++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++++
 .../selftests/bpf/progs/res_spin_lock.c       | 143 ++++++++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 244 ++++++++++++++++++
 4 files changed, 538 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
new file mode 100644
index 000000000000..115287ba441b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <sys/sysinfo.h>
+
+#include "res_spin_lock.skel.h"
+#include "res_spin_lock_fail.skel.h"
+
+void test_res_spin_lock_failure(void)
+{
+	RUN_TESTS(res_spin_lock_fail);
+}
+
+static volatile int skip;
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	while (!READ_ONCE(skip)) {
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "test_run");
+		ASSERT_OK(topts.retval, "test_run retval");
+	}
+	pthread_exit(arg);
+}
+
+void test_res_spin_lock_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct res_spin_lock *skel;
+	pthread_t thread_id[16];
+	int prog_fd, i, err;
+	void *ret;
+
+	if (get_nprocs() < 2) {
+		test__skip();
+		return;
+	}
+
+	skel = res_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load"))
+		return;
+	/* AA deadlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	/* Multi-threaded ABBA deadlock. */
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB);
+	for (i = 0; i < 16; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	topts.retval = 0;
+	topts.repeat = 1000;
+	int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA);
+	while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) {
+		err = bpf_prog_test_run_opts(fd, &topts);
+	}
+
+	WRITE_ONCE(skip, true);
+
+	for (i = 0; i < 16; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+
+	ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err");
+	ASSERT_OK(err, "err");
+	ASSERT_EQ(topts.retval, -EDEADLK, "timeout");
+end:
+	res_spin_lock__destroy(skel);
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index 298d48d7886d..74d912b22de9 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
 extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
 extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
 
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
 SEC("?tc")
 __failure __msg("arg#0 doesn't point to an irq flag on stack")
 int irq_save_bad_arg(struct __sk_buff *ctx)
@@ -510,4 +513,54 @@ int irq_sleepable_global_subprog_indirect(void *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_ooo_lock_cond_inv(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) {
+		bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+		return 0;
+	}
+
+	bpf_res_spin_unlock_irqrestore(&lockB, &flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_1(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	/* For now, bpf_local_irq_restore is not allowed in critical section,
+	 * but this test ensures error will be caught with kfunc_class when it's
+	 * opened up. Tested by temporarily permitting this kfunc in critical
+	 * section.
+	 */
+	bpf_local_irq_restore(&flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	bpf_local_irq_save(&flags1);
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags2))
+		return 0;
+	bpf_local_irq_restore(&flags2);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c
new file mode 100644
index 000000000000..b33385dfbd35
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define EDEADLK 35
+#define ETIMEDOUT 110
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
+SEC("tc")
+int res_spin_lock_test(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return -1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem2)
+		return -1;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (!bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem2->lock);
+		bpf_res_spin_unlock(&elem1->lock);
+		return -1;
+	}
+	bpf_res_spin_unlock(&elem1->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_AB(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockA);
+	if (r)
+		return !r;
+	/* Only unlock if we took the lock. */
+	if (!bpf_res_spin_lock(&lockB))
+		bpf_res_spin_unlock(&lockB);
+	bpf_res_spin_unlock(&lockA);
+	return 0;
+}
+
+int err;
+
+SEC("tc")
+int res_spin_lock_test_BA(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockB);
+	if (r)
+		return !r;
+	if (!bpf_res_spin_lock(&lockA))
+		bpf_res_spin_unlock(&lockA);
+	else
+		err = -EDEADLK;
+	bpf_res_spin_unlock(&lockB);
+	return err ?: 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx)
+{
+	struct bpf_res_spin_lock *locks[48] = {};
+	struct arr_elem *e;
+	u64 time_beg, time;
+	int ret = 0, i;
+
+	_Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 31,
+		       "RES_NR_HELD assumed to be 31");
+
+	for (i = 0; i < 34; i++) {
+		int key = i;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	for (; i < 48; i++) {
+		int key = i - 2;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	time_beg = bpf_ktime_get_ns();
+	for (i = 0; i < 34; i++) {
+		if (bpf_res_spin_lock(locks[i]))
+			goto end;
+	}
+
+	/* Trigger AA, after exhausting entries in the held lock table. This
+	 * time, only the timeout can save us, as AA detection won't succeed.
+	 */
+	if (!bpf_res_spin_lock(locks[34])) {
+		bpf_res_spin_unlock(locks[34]);
+		ret = 1;
+		goto end;
+	}
+
+end:
+	for (i = i - 1; i >= 0; i--)
+		bpf_res_spin_unlock(locks[i]);
+	time = bpf_ktime_get_ns() - time_beg;
+	/* Time spent should be easily above our limit (1/4 s), since AA
+	 * detection won't be expedited due to lack of held lock entry.
+	 */
+	return ret ?: (time > 1000000000 / 4 ? 0 : 1);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
new file mode 100644
index 000000000000..330682a88c16
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+long value;
+
+struct bpf_spin_lock lock __hidden SEC(".data.A");
+struct bpf_res_spin_lock res_lock __hidden SEC(".data.B");
+
+SEC("?tc")
+__failure __msg("point to map value or allocated object")
+int res_spin_lock_arg(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff));
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_cond_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_local_irq_save(&f1);
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock)) {
+		bpf_res_spin_unlock(&res_lock);
+		return 0;
+	}
+	bpf_res_spin_unlock(&elem->lock);
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo_irq(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1, f2;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) {
+		bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+		/* We won't have a unreleased IRQ flag error here. */
+		return 0;
+	}
+	bpf_res_spin_unlock_irqrestore(&elem->lock, &f2);
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1");
+struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2");
+
+SEC("?tc")
+__failure __msg("bpf_res_spin_unlock cannot be out of order")
+int res_spin_lock_ooo_unlock(struct __sk_buff *ctx)
+{
+	if (bpf_res_spin_lock(&lock1))
+		return 0;
+	if (bpf_res_spin_lock(&lock2)) {
+		bpf_res_spin_unlock(&lock1);
+		return 0;
+	}
+	bpf_res_spin_unlock(&lock1);
+	bpf_res_spin_unlock(&lock2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0")
+int res_spin_lock_bad_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((void *)&elem->lock + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset")
+int res_spin_lock_var_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	u64 val = value;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem) {
+		// FIXME: Only inline assembly use in assert macro doesn't emit
+		//	  BTF definition.
+		bpf_throw(0);
+		return 0;
+	}
+	bpf_assert_range(val, 0, 40);
+	bpf_res_spin_lock((void *)&value + val);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_map(struct __sk_buff *ctx)
+{
+	bpf_res_spin_lock((void *)&value + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("local 'kptr' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx)
+{
+	struct { int i; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	bpf_res_spin_lock((void *)p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";