From patchwork Thu Feb  6 10:54:09 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962806
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DB1D227B95;
	Thu,  6 Feb 2025 10:54:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839281; cv=none;
 b=sEmOM+RXG6yEO98I5ugChd79RWD3pSG87PEBEs6sFhljJw6OzfZHwDfPR72RVt0uy2ztbRMkE0CwtN/XzT0r297payGW7j2uJxfvuQbC381yLjxp0OiK6WnOoIaJQ58LbcbHEKjGuyVJQ5XVbDs6JJwAC7qfCNNF+NG8FGVKgqo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839281; c=relaxed/simple;
	bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=NPAxdiuZu6K9UN3tNE59awP8CPcNnUFnKVZokZRF8AOFMEOmN8amNMUQq5TK4zhnCMvmyjgZj5fF5t7s8CtGDTgpmewP2i8+VPCzUzrLx723bjGapjKgSEb2S7+qwwaVFa8xCExNoROUbUp2SPQBUaCAtk5a4gcGwNQLWP875Y0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=nXz3RrA/; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="nXz3RrA/"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43690d4605dso4793825e9.0;
        Thu, 06 Feb 2025 02:54:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839278; x=1739444078;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=nXz3RrA/8stgP+UsnGvRt4R51unV5VzltJAUGlkXRQXD1aP063mRHP6kMZOgDktG9q
         VRyu9Z7sV2FNLfsuB/5Ohd24Afu6logdv8r5o1zXoy5xh7cmx4IcaRfD+gkwfFBESqu+
         YaiP0Fyuf+CxggEDVijWTNSb6b/LeO8HS0cw35Uu7bUDuEh4ybsldeIHu6Z761RwZtfU
         UetchwXeBKAs8qf0vIYEzUowNQmFrYMW98r6cz3JV6xjvDYsTv+paFxWMRyvSmXfxfJQ
         V16jO2rT0+LTqRiWLhgZ5wkHMpeNMcg2V7ufujTLy6ygW+XvRpFPX2efbslYnZJHi31a
         Hs7A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839278; x=1739444078;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=Vk7qJ/4lRRYM1R9qI+FzPnEyddNnjzxg2lsNntJDs3Q+WC3srto9xmrBxPy6vxwhm7
         rQxKdxiX/guoh5f5NbFJDA/5urjDHDds3Oh3Jcs+xDXAv8jOxwi4mEHQ1tEEF7yN9qOr
         j9cGaENIq641UIy8AuY6m8B7dyqQKVyRvwahJBH0wcOSzXyZi5VBULVcLmv0bNKgIW58
         BFT9XydLJdTWBKYm4uKz0uyZqT+apNRxMOIs8awS9SYpM8FzNNCR+7CT3LdhkIHCVywv
         yHQu5qigiRsA7imOrqw4vDxkfXL2jStPcA9OKS3DYwKVsgmiiwsvHHfQXQKyWNDCULYs
         KvBA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUhT0aYtlpThzNU97MF0U6vLuvJGrcAQsh1IXHTpmBWJQba701f50saHuthFZoQCDCdOBt6o2UU3kyjMyI=@vger.kernel.org
X-Gm-Message-State: AOJu0YxdY6o8sRfxgotEZA1al0b8DY8sn3NdKbnNjsWUh7nVYxmbjQnt
	NQ1bp5wzjM21CzkzQOy6yiQhVTbLeZbLQOUcJdVcQj2HIYaJerBZ2s1tbGk1Xr8=
X-Gm-Gg: ASbGncsL1towLfvTsaQykJjh/gAi7EYFF7o6COTxTGPSnKazKpUyi4SfYzr5B5+ocl/
	chNs+iZ0IncTRs+eICYuNaFG8k2ABOnJ1+8p+vpI9+Rmilhb2JHj1Uq66P52tKTJmdYq/Z/qDAz
	chAzl2I2FffX3FHMDeGM5gkAAsT96VhNr5x82Py3AzHk/l8lV1vVT1/MO6S8RhdsWCXHHIJ2Pj4
	F7nzGJPHT2SmbCD/LtwcrRXGhfY9XNgbrnsqUVfdIV1VkZaxz1SSmqoETpp0aWapkKUQlYir8Mw
	5LJEXw==
X-Google-Smtp-Source: 
 AGHT+IHM5vaMe7dQKG64EkTeSRsdLjRkMoUvLfANIG4QYK0OqXKUcNrK28QFEfWzU8iR33/xEot5Og==
X-Received: by 2002:a05:6000:184c:b0:38d:b807:b894 with SMTP id
 ffacd0b85a97d-38db807bb8fmr3106134f8f.18.1738839277867;
        Thu, 06 Feb 2025 02:54:37 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dc050d688sm912424f8f.24.2025.02.06.02.54.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:37 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 01/26] locking: Move MCS struct definition to
 public header
Date: Thu,  6 Feb 2025 02:54:09 -0800
Message-ID: <20250206105435.2159977-2-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject;
 bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRjghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px
 YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUYwAKCRBM4MiGSL8RymRCD/
 9/YOhtMVXa4uhP+xTSUCPdwnoJ2e7ZwYDxog71LJ2HXxXg9/eSH8/Si/tA5gU1tWmG5QH+z9074vpJ
 vFqxdR0gSvIAsqQrBO0Tj1Qgtdyo22PRO5rV3ADONgAsBycwmCaZBA33dyS1teyYYdgun6rsOcUI+f
 7pF1YCbr7dccOv8O7agZL2Y/864xfviCiuvbQTK+cwwdqybOxwT1eXhHNeK0iH7aUa6XsePpDTIUo+
 Jjk3tHIskRifzXmI0/KTA1h/KRtLc8mbDdNugVNuuQ0Zglv9yZqIcVqy3z2l/OXaAVJlh5g0JO6dlM
 UFIZc61NIQP5h0n4mDGH9B1u7jcOYk+PiiJHCe89a2orxDwKXXe/wwnz5jKbllMykb9ikA9Ok+hW5d
 iP0uJ1ZTBE0OFUVJ7kZfX+CbwA9+l0dW6Sk5V88N8siHqhSZvKmyXQLvmnfjHCgGWq9VIkbzgNLFaZ
 yMEFQeShp0yCBfOTpqybarU2fc79rESPAl91vmwTkbkdGWUDl7ZmqaOlaNcnSsdyJIX4CxqqO0ad7h
 s2JcoMGOqNOzWMpcA7dkooTjlAydnB+BiivOD4kaYuS141fC0PntHgdN8c2hpVTrG9A64nnLBRMSVN
 wvOc8dXgBeQceXqmhiBDNuP6995mi4cgcucLwriFXse32y+vjJeqT2hqUBzA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/mcs_spinlock.h | 6 ++++++
 kernel/locking/mcs_spinlock.h      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..39c94012b88a 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -1,6 +1,12 @@
 #ifndef __ASM_MCS_SPINLOCK_H
 #define __ASM_MCS_SPINLOCK_H
 
+struct mcs_spinlock {
+	struct mcs_spinlock *next;
+	int locked; /* 1 if lock acquired */
+	int count;  /* nesting count, see qspinlock.c */
+};
+
 /*
  * Architectures can define their own:
  *
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..16160ca8907f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -15,12 +15,6 @@
 
 #include <asm/mcs_spinlock.h>
 
-struct mcs_spinlock {
-	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
-};
-
 #ifndef arch_mcs_spin_lock_contended
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics

From patchwork Thu Feb  6 10:54:10 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962808
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1EA7227B87;
	Thu,  6 Feb 2025 10:54:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839284; cv=none;
 b=U23WkA0zalzEGbXI6aw4NXTdZ5a6BVZ05OOxo/g56jl+UX+DmPP55zUHIw5vYCA9vKM+HxL/w12pNFOln7SBwinby9JA12K4VK+38C3Cnbuy0zfLJI7QvAZIMDNewYazhkl1zhhR3Nu7xV2LjRKvzZMil9ejmL2ji9nA6eMNkVM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839284; c=relaxed/simple;
	bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=WKqUxPlXO3u71nrRChPSDywIcX+MKSFtsi56OmyENAdvlVJD8B1fHFjLT8ypCD0KV4SoAB7tN/XvtV9D5tls9Wc2JwV1a2V02vZbzF44rDK6yutBJTrsYu67wa5ezkwhdBanntq/2eV0KoX7QtQqzzDRBz/oY5jGMbd1qtFFvDE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=i8aUoMTM; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="i8aUoMTM"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4363ae65100so8236495e9.0;
        Thu, 06 Feb 2025 02:54:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839280; x=1739444080;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=i8aUoMTMKRszuPUKf+cRTT3Mk0WCcwI6ZBkM8LMlXWUkPmnq3Z7mVZf1bJGMQ7DGzU
         gYEINJDCi+gqLVWtMplnGt9SVb/n2/r2Ye8IIK5UOBVAWpuB12yCRiDxCmmqrtB1+Zrz
         DCeG9xEXcJZAX2fn7u82cnopiDajWJv/FmPdZpwMrDg7/9ZtYSWS0KNqR8VnXumZiLin
         SJwGP7oQP5E7hK8Ta+6gPccb4srleANU+wAiCex2zt63oT9G3QCcMeh2TVjzBxNZHuno
         ZGLlkjHz6KQq4xcVZ8pWdc39KN3KAMx39XlL95FjImQzpBrGjBxKeLW6OGGJZhC4cmVy
         op1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839280; x=1739444080;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=HmhCqKSuQA7I/HkotO15bDh7qASL99cEeAGaO0TdPsufWo9dKEtCJKcsuxETRjR5y4
         5M0nmHoH74w9DPY+0SGEsv72OUeUp2wne2Uf8PIyKCDecUMW/IAY/42pKlOmm4rLOEb8
         OyeuMx8cl8MDjQBVHWOixqpXLP+jLcVkUDV4Zub4OYo9Nn3jFybk/k6s+th/rgctU5Nh
         N+oxlbMzLSogMdLiXy89rmEZiJG7hQknVEGrEhPhun5cpLLeXxFSBKXIPf/RrGhURabf
         bi92TK7OydzdAd3kGmHhDoOYur3kjMSU3B3Z/i4RB8Ujj/OCAFuQg1A9vOa4s3JUhYGy
         E0Hw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUuDzJSBGlJBl0sbpzdgIWeQ88MzabQPCj0FsgNzzOzMdcmi5D1E02U/egTYxEwju5x/ndA6cwHCb6EeoM=@vger.kernel.org
X-Gm-Message-State: AOJu0YwKxoQ26QU6h8gf/NKbMM3tgDWBxZIeuoSc5pXpmsoWhsiTAvgE
	mBcZURVTu4hLsacYKLZRtLcbsEnJ5knMtKwr451nCC1Ktx7x4+crkjPzuIfcKGA=
X-Gm-Gg: ASbGnctUMur35U/pQD48Th95k5thm43GiZ3tL3FmZhB/vnHxpVYumVsv8187JySfppB
	6Ak61HogfZ77xuaWjcfJt2td4V8/xw/SjkvG25tVOfFMAHWSkkAhb9bU/cmb1TwddORG9+tXfXs
	642MEhDFKd5h6/LPhsId4U/kVuXqG2loBnaYfQAfJMp+LNwPzPFNre2cC990d7nFa1ELunoXlUO
	N+IOAMJeS8c/R1GndRQpXT0TSGcYsBCW7hX5uqVNpJV8OLGznr0EVRTY7K4n8GjT+yMWYkBds8B
	4kM37A==
X-Google-Smtp-Source: 
 AGHT+IF4f8Xm3MmGWzddNVFfOdt5/pxOK6cPOqeFWXHyrZxzL6gB3MirVU0RpJy3SYCQ5MPv9d7ldg==
X-Received: by 2002:a05:600c:3593:b0:436:18d0:aa6e with SMTP id
 5b1f17b1804b1-4390d42f849mr64907365e9.5.1738839279263;
        Thu, 06 Feb 2025 02:54:39 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:20::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd5c87csm1439525f8f.52.2025.02.06.02.54.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:38 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 02/26] locking: Move common qspinlock helpers to a
 private header
Date: Thu,  6 Feb 2025 02:54:10 -0800
Message-ID: <20250206105435.2159977-3-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject;
 bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkkVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL
 cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RylRQD/
 9ZckUJptWve6Ivsaj0tCRlXmeXvXakYFfReqoU4TTSiK5e60c9zMXr1PSHiloHlftlQqfuKi7ug6yx
 A0bUp3hubkSGlTfDSdMPwLh1AN2D0QpPIKuW9AIo5mw5fag+zRMjgwdjsh3o2biZZ4JsC36jA+tvRV
 OhywUPmmoZL/U7GKGzWmptqXq4iD7oAmPhyHSIyZ80efwrSDnwF8UIzoX8wR8vqgwUPSp3g3TAgdqS
 Wu+LKgl9hNtbykaP4Jj433O5chD603DDAL+C0COzSBRTNaFTRqqXx3o/3rvGoEDspvDUhm+uXtQJyl
 AHG62uLsNo3EyObbRiK6pLo/hjsdzLmLnWdfJb2NV9sJPp2VA5FxIiDCkL+P/08d6XoTHUePanJNRI
 p2OhTqrJFD4fN/JDiNEwhFAcdlvwcrSZV/qDLoUrrD0UYVuywjxrranVRZz6bQyIV9JZv/P8i0TkzE
 w1DQVyyoWScGq6wi5NLOun+C1IdPnB/k7AISjvxU+vTO7bhPWcKARMgbyFTCw+uBqNq614OwvBHiR1
 SktvlKEmTFENUpURQ5Kkx8YOM8Bu1bLOunw6V1hyLl9WU5VH5WgZ2AhT3MLH8H1SixfIzevgMLkfIN
 BLB2b1rHlEnEgsNDOx3qTtgKlgKaHmaveMw+6M64H8QDFPh3iV10yEj7kN0A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/qspinlock.c | 193 +----------------------------------
 kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+), 188 deletions(-)
 create mode 100644 kernel/locking/qspinlock.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7d96bed718e4..af8d122bb649 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -25,8 +25,9 @@
 #include <trace/events/lock.h>
 
 /*
- * Include queued spinlock statistics code
+ * Include queued spinlock definitions and statistics code
  */
+#include "qspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -67,36 +68,6 @@
  */
 
 #include "mcs_spinlock.h"
-#define MAX_NODES	4
-
-/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
- * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
- * that, we insert two more long words to pad it up to 32 bytes. IOW, only
- * two of them can fit in a cacheline in this case. That is OK as it is rare
- * to have more than 2 levels of slowpath nesting in actual use. We don't
- * want to penalize pvqspinlocks to optimize for a rare case in native
- * qspinlocks.
- */
-struct qnode {
-	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	long reserved[2];
-#endif
-};
-
-/*
- * The pending bit spinning loop count.
- * This heuristic is used to limit the number of lockword accesses
- * made by atomic_cond_read_relaxed when waiting for the lock to
- * transition out of the "== _Q_PENDING_VAL" state. We don't spin
- * indefinitely because there's no guarantee that we'll make forward
- * progress.
- */
-#ifndef _Q_PENDING_LOOPS
-#define _Q_PENDING_LOOPS	1
-#endif
 
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
@@ -106,161 +77,7 @@ struct qnode {
  *
  * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
-
-/*
- * We must be able to distinguish between no-tail and the tail at 0:0,
- * therefore increment the cpu number by one.
- */
-
-static inline __pure u32 encode_tail(int cpu, int idx)
-{
-	u32 tail;
-
-	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
-	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
-
-	return tail;
-}
-
-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
-{
-	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
-	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
-
-	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
-}
-
-static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
-{
-	return &((struct qnode *)base + idx)->mcs;
-}
-
-#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
-
-#if _Q_PENDING_BITS == 8
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->pending, 0);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- *
- * Lock stealing is not allowed if this function is used.
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
-}
-
-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	/*
-	 * We can use relaxed semantics since the caller ensures that the
-	 * MCS node is properly initialized before updating the tail.
-	 */
-	return (u32)xchg_relaxed(&lock->tail,
-				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
-#else /* _Q_PENDING_BITS == 8 */
-
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	atomic_andnot(_Q_PENDING_VAL, &lock->val);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
-}
-
-/**
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail)
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	u32 old, new;
-
-	old = atomic_read(&lock->val);
-	do {
-		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
-		/*
-		 * We can use relaxed semantics since the caller ensures that
-		 * the MCS node is properly initialized before updating the
-		 * tail.
-		 */
-	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
-
-	return old;
-}
-#endif /* _Q_PENDING_BITS == 8 */
-
-/**
- * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
- * @lock : Pointer to queued spinlock structure
- * Return: The previous lock value
- *
- * *,*,* -> *,1,*
- */
-#ifndef queued_fetch_set_pending_acquire
-static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
-{
-	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
-}
-#endif
-
-/**
- * set_locked - Set the lock bit and own the lock
- * @lock: Pointer to queued spinlock structure
- *
- * *,*,0 -> *,0,1
- */
-static __always_inline void set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
-}
-
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
 /*
  * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
@@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * any MCS node. This is not the most elegant solution, but is
 	 * simple enough.
 	 */
-	if (unlikely(idx >= MAX_NODES)) {
+	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
 		while (!queued_spin_trylock(lock))
 			cpu_relax();
@@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
-		prev = decode_tail(old);
+		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
new file mode 100644
index 000000000000..d4ceb9490365
--- /dev/null
+++ b/kernel/locking/qspinlock.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Queued spinlock defines
+ *
+ * This file contains macro definitions and functions shared between different
+ * qspinlock slow path implementations.
+ */
+#ifndef __LINUX_QSPINLOCK_H
+#define __LINUX_QSPINLOCK_H
+
+#include <asm-generic/percpu.h>
+#include <linux/percpu-defs.h>
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/mcs_spinlock.h>
+
+#define _Q_MAX_NODES	4
+
+/*
+ * The pending bit spinning loop count.
+ * This heuristic is used to limit the number of lockword accesses
+ * made by atomic_cond_read_relaxed when waiting for the lock to
+ * transition out of the "== _Q_PENDING_VAL" state. We don't spin
+ * indefinitely because there's no guarantee that we'll make forward
+ * progress.
+ */
+#ifndef _Q_PENDING_LOOPS
+#define _Q_PENDING_LOOPS	1
+#endif
+
+/*
+ * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * size and four of them will fit nicely in one 64-byte cacheline. For
+ * pvqspinlock, however, we need more space for extra data. To accommodate
+ * that, we insert two more long words to pad it up to 32 bytes. IOW, only
+ * two of them can fit in a cacheline in this case. That is OK as it is rare
+ * to have more than 2 levels of slowpath nesting in actual use. We don't
+ * want to penalize pvqspinlocks to optimize for a rare case in native
+ * qspinlocks.
+ */
+struct qnode {
+	struct mcs_spinlock mcs;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+	long reserved[2];
+#endif
+};
+
+/*
+ * We must be able to distinguish between no-tail and the tail at 0:0,
+ * therefore increment the cpu number by one.
+ */
+
+static inline __pure u32 encode_tail(int cpu, int idx)
+{
+	u32 tail;
+
+	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
+	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
+
+	return tail;
+}
+
+static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes)
+{
+	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
+	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
+
+	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+}
+
+static inline __pure
+struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+{
+	return &((struct qnode *)base + idx)->mcs;
+}
+
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->pending, 0);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	/*
+	 * We can use relaxed semantics since the caller ensures that the
+	 * MCS node is properly initialized before updating the tail.
+	 */
+	return (u32)xchg_relaxed(&lock->tail,
+				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
+		/*
+		 * We can use relaxed semantics since the caller ensures that
+		 * the MCS node is properly initialized before updating the
+		 * tail.
+		 */
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return old;
+}
+#endif /* _Q_PENDING_BITS == 8 */
+
+/**
+ * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
+ * @lock : Pointer to queued spinlock structure
+ * Return: The previous lock value
+ *
+ * *,*,* -> *,1,*
+ */
+#ifndef queued_fetch_set_pending_acquire
+static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+#endif
+
+/**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+}
+
+#endif /* __LINUX_QSPINLOCK_H */

From patchwork Thu Feb  6 10:54:11 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962809
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E42092288D3;
	Thu,  6 Feb 2025 10:54:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839284; cv=none;
 b=M5TbOVQcwMHX1BWL7R1kOPKGmIQI5u0MI3uww6eT5hGy18FylNHb14FKlCuAHdDVjKr3QrfCauM9znx+b92FZFLsgrMOnUfaeqLWojLN6XaYTuWtkFdk7B3i6NWEJwXQb0s5ynI4PK+pXH/Wfml+zMctaV//yyGhxmyK+3ssjcM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839284; c=relaxed/simple;
	bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=WbhUv8r2jzKsbygWGjzYox31gOJ6rwwPv6r4+iQZtuVBOsLffa7VKe5QLubEQ9GUoDnxQik65wcZUnMliHIhV+o/9r+UQPrsyn0PLKa9iGuJ9OGa0mowDp0o3fsCC8oetLQ234IWScYu/qUA7J/yhL5Wf14aRjDr68khVg3kOb8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ExLgXyzM; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ExLgXyzM"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43618283dedso7147175e9.3;
        Thu, 06 Feb 2025 02:54:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839281; x=1739444081;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=ExLgXyzMQ3wJtmyTQ3ChOhzAAOyDLmcgPnS3WhOqqdWSXUCGFGTRo04vhiwW3hQB7V
         Xh+3RkprZZmmuXtC8GYwcS9Srz6h16Mx19lymY0nN9LuAQHqgxqdPh4ECMBmmRmNchYF
         /WB61kCz4eGTDLuJWT1T9yUxY1DWKV+PdPOIvWOHqY8R/sLIb+KJhqJwJ9V6y8dksyQp
         DAFxt0AxZTtekA+Lq6w5u8ue0YNQMIqn5B6l7qRt7niG+hjnimWJgaPzc20B3f+P2RIU
         8AlqZ89NgfGHmGRABK0HYnTvp+KUhxHNMoL3j1HpJ92EyDqGr0iC0sql38HXwGLH+faW
         D2Uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839281; x=1739444081;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=CAi0ayEmAuAzMysbobT3eqcBPAtIWkG+NiRqonbqqhpcYAe847xxs1qk3cx8N9ORKC
         OE5MgDJsgqs6erkQWXqmpSgOShLmQWMz5FKQFFS2Un9naee8uINUMM/IN2c0hHrjIM97
         p3yLYZ9cPEqSZ3HyWr7DCb0rXl9y5X7gqMy4Z4xllHfkOJfzCmsbByHyQa97GljYohSZ
         TxHXV3Xq4nraiQ/nE/Zkyw7LNwB/q0K/zTtDgDu9oG1JyBrrsJkRLePkLC8FlSuDjo/B
         8FcGcqQElKW8TXdKRoaFljyVT7SXruG8G35AIvnhLE53dQTHIykK6k4S9FNrDpJseIYM
         io0Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCUdmPVf1wTZmETKeGXLccQldUkbJW0cCyGJi8RWl/b5/+jFybcShZlSPGH+yd3ok9RVQmZr7lvU1XmFEFI=@vger.kernel.org
X-Gm-Message-State: AOJu0YwcXQTapmcczyNg34+gQ9CGhTHsMIBxgm/18kxBtnKjR0VB26ZS
	jWw/lZWL17fRuoU0td8/fTsQJHi8k9aJKDcXmy9G9fE9s3glOgS4eBXPzc680hY=
X-Gm-Gg: ASbGncvw3KyZRUEZF+4CHEpVmNMDwy14aWq0FcTrOO6HhGM1RYxeV6mcpsCv3XkrqRw
	RLCIK1wpQdv+nLwyxwlkcGhtiwP0S4ulzgeGKh6znxCaWI3JIqXfLWtss6AA2bTzN2sNDpzyoEs
	KhyXiBpBMufbqDyeS/ZZF4bsddy1q5J3cd0kSCPoTTGLCoJoxe5Kob4OBmAlAWhw8DPQuiLrP+Y
	XwPHSUif9VRVKc0ILfxhlAUg+dfyW39FlUWV01jRueik3WfJ/CUG6cJRI2D0STXRwlca+V0pezG
	MsPpBQ==
X-Google-Smtp-Source: 
 AGHT+IHeU2y2+7O/uQKeXcLLdS77pnq5NJz5kvWtZprZ6uplDLOI4uMkvBsICSMk+DCJ+KCjzZMlhA==
X-Received: by 2002:a05:600c:5114:b0:435:32e:8270 with SMTP id
 5b1f17b1804b1-4390d43d8ecmr61035955e9.14.1738839280667;
        Thu, 06 Feb 2025 02:54:40 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1b::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde0fd23sm1381770f8f.71.2025.02.06.02.54.39
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:40 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 03/26] locking: Allow obtaining result of
 arch_mcs_spin_lock_contended
Date: Thu,  6 Feb 2025 02:54:11 -0800
Message-ID: <20250206105435.2159977-4-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject;
 bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b
 Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RypWCD/
 9a6wXeJpwHCx6Ry/+60+90gq49Vx2iM0Ni2K5/JZ/w1SQ1K45M6aFIB7872eGf0XNmEL8eeYRcg6ky
 oIbxfK6Dz0h4c4cjezC6IO7ADwiRf1nt9gfi3HEwqdWIzBMac9NGH1ITZzTSB9FZuAkqXAfpZZq0iW
 l9L7JT2bKDalL+4TjYIHnpQaWg+MzdyiUwK9daJqckornGM5bI4UFqkTtWU74gzcj9I8Ww9UAJmZll
 /0oSi0NI558EvQj/JozkqtizlLGWLPs8WdwyXca/O84yRKYAhzPQZ+Ebe9SP1IcoorQstTzGtBQLkN
 zeZqTP0CpGFfPZy9B3ND9iBY7eJQQxB2h2yT74z85HkKKEjc2v0asUqBT4vT+B5nCPzdr638w7FtGe
 emuAf5iymDVjVyVtnlOV9UV5465nfpUZv9DqAwksMUOVSc3qvTgbKcDN4XsiSIJPYVu+XDqiFZqx3Y
 WlVVJyTFe+L+QnTVi8xn2N/fICy+rFILllPSBpOso5O/9KXfpcu9vnaCtViVPakzRGJqLFeOjxAQNd
 hQxMdvz26zEoWX77OknnGmg7FcT6xSJ9eFK4uTMlDQbiikiPsGIMVPxVutFarPkQs5maZ7giuy1hnE
 iy/9siJLZmE3DRYLv0bOoj+wxlb6+yk1cnr9VBCzo9XnhvUEn57+U1QY9opw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/mcs_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 16160ca8907f..5c92ba199b90 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -24,9 +24,7 @@
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
 #define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
-} while (0)
+	smp_cond_load_acquire(l, VAL)
 #endif
 
 #ifndef arch_mcs_spin_unlock_contended

From patchwork Thu Feb  6 10:54:12 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962811
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 947CA22B587;
	Thu,  6 Feb 2025 10:54:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839288; cv=none;
 b=hGh+bohaqHB4vBPEIL0gSFjrKpBMR3wOY0tZbmiJHxyRFA4TPPnKmZrS9LdjNh09soyTSX8zWrXxl4UUc0KVcVWxHwP5qyhiebaG5QoysyrQyz4uei5932/4rj4Gz+hJawgHp3+WEkBELWn4uH2fXVpAhSzjCFWULV1Ga5BBs10=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839288; c=relaxed/simple;
	bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=CJ9lN9bmEtGzXQpGNEzbSyIUBPBQNM5Qsm4DUSB60q2EJVzKEvjyqe5rMheznfNm8/IQg1nhl9Mzl7a5G6N5w4tjGKa7IC5xZkYKwTLTP1iscRBOEgVnKl5wgAdac3aZlCDrQwia9vkFkYQuBdVX4vAVNKPeWCYYtYAWrAuADmI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ccgkuicv; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ccgkuicv"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-38dae70f5d9so334883f8f.1;
        Thu, 06 Feb 2025 02:54:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839282; x=1739444082;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=;
        b=ccgkuicvgPPG6B754LrUSb4srrLEKv5E3GqvNBHiue3dc6lF8JvDdH8SlOw8p9mOIQ
         6HOAdNixT7ii5FITn2YXHiXqENzc10IzOcnMQIvGtfeR+q2EtKL33vGuYSMNcTAvlEjg
         sk5MEjKWfCXIZucZhulub58qBFoSSDIINVxYvajeim6hruAcWc1YledLFIouHtiA7RTw
         5RlIWcAeI6wHRt4CcJeIHRu+es6WIHPAlVmtP/6vivkzqhYzShBWVPAKCZuo+9u8yONT
         us3qct42lFgrCyeoDz5Q6dlDv97Jx4xOUAUyKMvRf96K/C1q4l4f552+X1A8WrBDjBlb
         mJ4w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839282; x=1739444082;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=;
        b=J0Zs++TwDBYWIyz2qOJOB5X55akuka+q5SSWaMHuZTgKKh0a7aDe+xDDgI463LQrFk
         jeSxKysZRGRs1rlOWMRVVa0cim9zCWONlZnpRWi5FGtk4LQAGAaiqEYEgZjpqL5RA93T
         pxpp6oZ5yOldkFzpezfzcOg3zgRJzXVHrbkPlAJ5i4UncpIjgN8GsO3IxGLg5yDo5alH
         qarBoKH2v6PGw8Sa5VHpXyTZq7sBhFBSdkU7odkdjmvMh5cUowISjE35qKQlsSYkGX8u
         MTBxPkxyWMEXWi9GRss+prqXq9xo1yDGIddne0H9/J4He9/7ce/Dcs6/jPHeXEHUzHu9
         SghA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWMUs7veu/JFhdPMbGwLeGAoYDi2/0MG+W04ni+jXY3XM+OGfH+GcO9Cby+eYEp7CHaBxi9lEWp7dPaRXI=@vger.kernel.org
X-Gm-Message-State: AOJu0YyDAbnCMQXTmgrUXyI7NVYP1FXETnEw3qTXZo+dOxPKc2WK2bTW
	Nj7eQEcZbOM8XG4f+FMktKF6YCcAB2Das8F3IIV8hSCaKxjDBFPfpCTe1LJUkw4=
X-Gm-Gg: ASbGncspmsCIK16ZsCSogJCuFfl7JIxnilPC5vXNHsKMUSP8h7jiMlbmaInoUiDWuvX
	6tpprv3d5N3pNHLZJVtemCv3OFFm0KAoaMxreuFFmXv1wcG0j+Jo9gaWBEWV6Oy1EXMzXnaRjqr
	OzNYNdyYVfqSiS3Jo+j+raQFPI+9D6Q/BuvqrfxWcB2F2aJ+906f7iam+EI2xBcoE0coStHgQU9
	s/DDN2IKW01hCbwHcC7GJoop6EoCblXVITYqu8hUCp1jn5Y8eKXJScLFdNw3jT2PBdBpT+QEoPc
	lkJw1w==
X-Google-Smtp-Source: 
 AGHT+IED4qd4F+YsOBRJnZkqjdZ0qCWlhz5W6kmheG8y6eyBeQtVsfltvea7zK4aPaLbItj8zaxpkQ==
X-Received: by 2002:a5d:5850:0:b0:385:f1f2:13ee with SMTP id
 ffacd0b85a97d-38db490f8b3mr4373779f8f.46.1738839282225;
        Thu, 06 Feb 2025 02:54:42 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1a::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde1ddb9sm1382899f8f.84.2025.02.06.02.54.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:41 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 04/26] locking: Copy out qspinlock.c to
 rqspinlock.c
Date: Thu,  6 Feb 2025 02:54:12 -0800
Message-ID: <20250206105435.2159977-5-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=14101; h=from:subject;
 bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkgODNr0FrQcgKvek25PD/Kof6Fg8QSVqhrY4b
 ll0Vr4eJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RyjcyEA
 CWkEv1hNRGlRz7ngHfRjKSfdxkoW+Jesqf9WJHO5raMEArHLDtxB4dMXEW9mPu0R2nbppStFQt+zh4
 Emq++RkywjOxaooDM0K4Iurr6vOdSmpHfMiuqB6mZ4kHQ8fLbplR9vkDxA89qbEo0xPKh8RkEMtnwB
 vQKt8yT7Ubg4GV2eG76RGhBn9nPueBqpO/5X++hWy32N74C5r0qYYBhROpXLqY/R5Rpn4lV8TqhdRk
 rdiQNpdPA716MET8PP5iDJ+o22hHBxGRZLsv6TcXimxObYAneQ3XcPzdZNeQ0Lu1lmUL0ItvP6uVj/
 Txl2TNgwgBgAu0HO1D5mIlX0DjWhlO/uH28Av4PWzHG8tTbA/gu81NyPa2GLFla0L8uiluz1HnLnSg
 b5MHVmXaAlvzACli+/WGtI5LeXTHhpKwels4eBwH2wCTHh5PyhPAgMvnCwaX0iRES/oSEYMV80tDGI
 oXrop0LJRkfimay6TKAOynHmk8HjO2vvkVuZe5Uc48ycPaC6eVxVz/k1IUcuKo4Ucf2q1/9PKIYU18
 TYQ60j+i23PPwphC6ianMx4TGKm7EF1yZKUpEw/Vu9biDK7WpNF4KKzhD74yMfEucCO5aVnYjoaNTj
 5RlEy21xIyGTRCEgZQJDOJPB36mKO3NQmcDl4clT647X86fuZjC/mHeKcouA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 kernel/locking/rqspinlock.c

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
new file mode 100644
index 000000000000..caaa7c9bbc79
--- /dev/null
+++ b/kernel/locking/rqspinlock.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ * (C) Copyright 2013-2014,2018 Red Hat, Inc.
+ * (C) Copyright 2015 Intel Corp.
+ * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ *          Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
+#include <linux/smp.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mutex.h>
+#include <linux/prefetch.h>
+#include <asm/byteorder.h>
+#include <asm/qspinlock.h>
+#include <trace/events/lock.h>
+
+/*
+ * Include queued spinlock definitions and statistics code
+ */
+#include "qspinlock.h"
+#include "qspinlock_stat.h"
+
+/*
+ * The basic principle of a queue-based spinlock can best be understood
+ * by studying a classic queue-based spinlock implementation called the
+ * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable
+ * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and
+ * Scott") is available at
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=206115
+ *
+ * This queued spinlock implementation is based on the MCS lock, however to
+ * make it fit the 4 bytes we assume spinlock_t to be, and preserve its
+ * existing API, we must modify it somehow.
+ *
+ * In particular; where the traditional MCS lock consists of a tail pointer
+ * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to
+ * unlock the next pending (next->locked), we compress both these: {tail,
+ * next->locked} into a single u32 value.
+ *
+ * Since a spinlock disables recursion of its own context and there is a limit
+ * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there
+ * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now
+ * we can encode the tail by combining the 2-bit nesting level with the cpu
+ * number. With one byte for the lock value and 3 bytes for the tail, only a
+ * 32-bit word is now needed. Even though we only need 1 bit for the lock,
+ * we extend it to a full byte to achieve better performance for architectures
+ * that support atomic byte write.
+ *
+ * We also change the first spinner to spin on the lock bit instead of its
+ * node; whereby avoiding the need to carry a node from lock to unlock, and
+ * preserving existing lock API. This also makes the unlock code simpler and
+ * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *      atomic operations on smaller 8-bit and 16-bit data types.
+ *
+ */
+
+#include "mcs_spinlock.h"
+
+/*
+ * Per-CPU queue node structures; we can never have more than 4 nested
+ * contexts: task, softirq, hardirq, nmi.
+ *
+ * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
+
+/*
+ * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
+ * for all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+					   struct mcs_spinlock *prev) { }
+static __always_inline void __pv_kick_node(struct qspinlock *lock,
+					   struct mcs_spinlock *node) { }
+static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
+						   struct mcs_spinlock *node)
+						   { return 0; }
+
+#define pv_enabled()		false
+
+#define pv_init_node		__pv_init_node
+#define pv_wait_node		__pv_wait_node
+#define pv_kick_node		__pv_kick_node
+#define pv_wait_head_or_lock	__pv_wait_head_or_lock
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
+/**
+ * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ * @val: Current value of the queued spinlock 32-bit word
+ *
+ * (queue tail, pending bit, lock value)
+ *
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
+ */
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+	struct mcs_spinlock *prev, *next, *node;
+	u32 old, tail;
+	int idx;
+
+	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
+
+	if (pv_enabled())
+		goto pv_queue;
+
+	if (virt_spin_lock(lock))
+		return;
+
+	/*
+	 * Wait for in-progress pending->locked hand-overs with a bounded
+	 * number of spins so that we guarantee forward progress.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	if (val == _Q_PENDING_VAL) {
+		int cnt = _Q_PENDING_LOOPS;
+		val = atomic_cond_read_relaxed(&lock->val,
+					       (VAL != _Q_PENDING_VAL) || !cnt--);
+	}
+
+	/*
+	 * If we observe any contention; queue.
+	 */
+	if (val & ~_Q_LOCKED_MASK)
+		goto queue;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock
+	 */
+	val = queued_fetch_set_pending_acquire(lock);
+
+	/*
+	 * If we observe contention, there is a concurrent locker.
+	 *
+	 * Undo and queue; our setting of PENDING might have made the
+	 * n,0,0 -> 0,0,0 transition fail and it will now be waiting
+	 * on @next to become !NULL.
+	 */
+	if (unlikely(val & ~_Q_LOCKED_MASK)) {
+
+		/* Undo PENDING if we set it. */
+		if (!(val & _Q_PENDING_MASK))
+			clear_pending(lock);
+
+		goto queue;
+	}
+
+	/*
+	 * We're pending, wait for the owner to go away.
+	 *
+	 * 0,1,1 -> *,1,0
+	 *
+	 * this wait loop must be a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because not all
+	 * clear_pending_set_locked() implementations imply full
+	 * barriers.
+	 */
+	if (val & _Q_LOCKED_MASK)
+		smp_cond_load_acquire(&lock->locked, !VAL);
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	clear_pending_set_locked(lock);
+	lockevent_inc(lock_pending);
+	return;
+
+	/*
+	 * End of pending bit optimistic spinning and beginning of MCS
+	 * queuing.
+	 */
+queue:
+	lockevent_inc(lock_slowpath);
+pv_queue:
+	node = this_cpu_ptr(&qnodes[0].mcs);
+	idx = node->count++;
+	tail = encode_tail(smp_processor_id(), idx);
+
+	trace_contention_begin(lock, LCB_F_SPIN);
+
+	/*
+	 * 4 nodes are allocated based on the assumption that there will
+	 * not be nested NMIs taking spinlocks. That may not be true in
+	 * some architectures even though the chance of needing more than
+	 * 4 nodes will still be extremely unlikely. When that happens,
+	 * we fall back to spinning on the lock directly without using
+	 * any MCS node. This is not the most elegant solution, but is
+	 * simple enough.
+	 */
+	if (unlikely(idx >= _Q_MAX_NODES)) {
+		lockevent_inc(lock_no_node);
+		while (!queued_spin_trylock(lock))
+			cpu_relax();
+		goto release;
+	}
+
+	node = grab_mcs_node(node, idx);
+
+	/*
+	 * Keep counts of non-zero index values:
+	 */
+	lockevent_cond_inc(lock_use_node2 + idx - 1, idx);
+
+	/*
+	 * Ensure that we increment the head node->count before initialising
+	 * the actual node. If the compiler is kind enough to reorder these
+	 * stores, then an IRQ could overwrite our assignments.
+	 */
+	barrier();
+
+	node->locked = 0;
+	node->next = NULL;
+	pv_init_node(node);
+
+	/*
+	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
+	 * attempt the trylock once more in the hope someone let go while we
+	 * weren't watching.
+	 */
+	if (queued_spin_trylock(lock))
+		goto release;
+
+	/*
+	 * Ensure that the initialisation of @node is complete before we
+	 * publish the updated tail via xchg_tail() and potentially link
+	 * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
+	 */
+	smp_wmb();
+
+	/*
+	 * Publish the updated tail.
+	 * We have already touched the queueing cacheline; don't bother with
+	 * pending stuff.
+	 *
+	 * p,*,* -> n,*,*
+	 */
+	old = xchg_tail(lock, tail);
+	next = NULL;
+
+	/*
+	 * if there was a previous node; link it and wait until reaching the
+	 * head of the waitqueue.
+	 */
+	if (old & _Q_TAIL_MASK) {
+		prev = decode_tail(old, qnodes);
+
+		/* Link @node into the waitqueue. */
+		WRITE_ONCE(prev->next, node);
+
+		pv_wait_node(node, prev);
+		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
+	}
+
+	/*
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
+	 *
+	 * *,x,y -> *,0,0
+	 *
+	 * this wait loop must use a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because the set_locked() function below
+	 * does not imply a full barrier.
+	 *
+	 * The PV pv_wait_head_or_lock function, if active, will acquire
+	 * the lock and return a non-zero value. So we have to skip the
+	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
+	 * been designated yet, there is no way for the locked value to become
+	 * _Q_SLOW_VAL. So both the set_locked() and the
+	 * atomic_cmpxchg_relaxed() calls will be safe.
+	 *
+	 * If PV isn't active, 0 will be returned instead.
+	 *
+	 */
+	if ((val = pv_wait_head_or_lock(lock, node)))
+		goto locked;
+
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+
+locked:
+	/*
+	 * claim the lock:
+	 *
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,*,0 -> *,*,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail)
+	 * and nobody is pending, clear the tail code and grab the lock.
+	 * Otherwise, we only need to grab the lock.
+	 */
+
+	/*
+	 * In the PV case we might already have _Q_LOCKED_VAL set, because
+	 * of lock stealing; therefore we must also allow:
+	 *
+	 * n,0,1 -> 0,0,1
+	 *
+	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
+	 *       above wait condition, therefore any concurrent setting of
+	 *       PENDING will make the uncontended transition fail.
+	 */
+	if ((val & _Q_TAIL_MASK) == tail) {
+		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+			goto release; /* No contention */
+	}
+
+	/*
+	 * Either somebody is queued behind us or _Q_PENDING_VAL got set
+	 * which will then detect the remaining tail and queue behind us
+	 * ensuring we'll see a @next.
+	 */
+	set_locked(lock);
+
+	/*
+	 * contended path; wait for next if not observed yet, release.
+	 */
+	if (!next)
+		next = smp_cond_load_relaxed(&node->next, (VAL));
+
+	arch_mcs_spin_unlock_contended(&next->locked);
+	pv_kick_node(lock, next);
+
+release:
+	trace_contention_end(lock, 0);
+
+	/*
+	 * release the node
+	 */
+	__this_cpu_dec(qnodes[0].mcs.count);
+}
+EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()	true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head_or_lock
+
+#undef  resilient_queued_spin_lock_slowpath
+#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
+
+#include "qspinlock_paravirt.h"
+#include "rqspinlock.c"
+
+bool nopvspin;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+#endif

From patchwork Thu Feb  6 10:54:13 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962810
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB62922B5A1;
	Thu,  6 Feb 2025 10:54:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839287; cv=none;
 b=eFPcQS+MTuLUNFGRy+ex+C6ZoCkHQicxrAQjps0E+9+/LJ2pHM1GSf6I9zGYbhwGSJrG0UFIB+R6O+EGNQ08O02dmKOKZflZVGxlig9N2aVMWY1uz2eD+OHoGeGqs2Jakxa5xJR3XbKRPqZ9lo8igxu64tifltT028uts4a8RNk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839287; c=relaxed/simple;
	bh=AVuKKIzjLnum/Q8Jp3BllOKRS5yuVDUgY1U4pKev08I=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=t1c2adQ4yiI4nBEZHB5N+Se4FILwXXpw9N6iES/lhYJmXb59PI5WR8LrmLBB/UHyzNDbkNH20mfwepgIZJKXM2kYQnIESupkcl0+/Ytr/1Gqm0vc+aOvnQWk5dPZdgjLVgJnf1IoK36LtiiXMqR5xOGFtYRPPTY1pphKkFdmlmE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=WSuA37w3; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="WSuA37w3"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-38a25d4b9d4so349100f8f.0;
        Thu, 06 Feb 2025 02:54:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839283; x=1739444083;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=joZkmdV/dAzLK1UShpkitp+TG0brsNHWZrt+dWRUiK4=;
        b=WSuA37w3iQ+yB/rwMCM+67sVm6lGfW45a51GseFHpNXoOKH+Jo0iHT4g7IAlNbePQY
         5nw2WQ6egFILejeK6r/gfw+4LBSxXlOTkawXRm0fBsIlfRumKtRjGPEu6YhCAtC0mz5C
         mTg2jpQ2Mtm1oPZSRaKGHIajFHdzK9yjVoo3lc/DE9Y5P7nDLd2VL3wxJOBesACZC3K5
         dXVeJvymHqddqAsoEfkVlpcLIPLBjLmkVV8lFnlmPCfanpvPkL8Dnju6IUArwQiFr5EA
         SoswNGCtMaF2emC6/h6hYUXnCyb5jIrAnZERKcC8h2jLB2vIRFi4ODL1Xb6uJu8wtD4m
         wphQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839283; x=1739444083;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=joZkmdV/dAzLK1UShpkitp+TG0brsNHWZrt+dWRUiK4=;
        b=Rx5rGPDBpKDuHZU0oMxodgw++mI7/b167yVTx+r7bZkQ8AHq0n9ooNrqbOWj89PvBt
         oxVIafCs+EdrAJji5tqfVGoFr0qF3xHWcM5da+z7nRut8Tq0oe4wHmKaCAtdVIWAFPHE
         SReBlY7pG7HOAPQFdFl8VcbbPkkY76yixtE6T9e3VV5KIVaebZ3y6I75846Iz5sJ107h
         S6PfdaV9yf/XdX9taWrR+e5VWwzvjnseowlJs4Li2JG2zkRXh4kSYaakJj/QXr5hX5WY
         vxjDr/48U/4gzSAoKx+iJY9SnOKaxruD6EnZTRWuB/DYxEwLjsXs6n8lsdNM+rwmC5X8
         WNFw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVVrNQc/NYRgD5BuyVtQxKsjGYJRjRP4mZVHlywPHdl/ZgdIlycgqiMO2/97V/CCr8K3wdtQgH7UcjnNrc=@vger.kernel.org
X-Gm-Message-State: AOJu0YwM5h/Q6r2gVqgTBU3FCMKtHm7UgF4d4qijOMlV8XkogGhAGflG
	cQn55cOOtAq/pSIyvWs3rcPVPkmVlAu9mTYLvXl+5BhJ2qhJlNrMDvq0UL1ns8g=
X-Gm-Gg: ASbGncutHC0avuJEaSpzQDw9WESpi/m2SWCpqtUCyEibuMUWwotuMN6xOLWBPT3V18h
	PB/7rQv/Z8SDKS5TxzdENkMrpPFicg2MKunqOFO3CCDkpsEeUsdhsOqGpt4UT/AJTYki6EF5T+7
	L81snTGeu8naqYbn8sb5Bo8npLL8duyIakfJFpODrtS3P95j2yIxWy70G8DnIQzSCdb+QLadUO7
	AGGUJVFEnNNaEgVng5XN0PCL9GOdFw9v8uKDJ/0MXK8x0YqUHw6P1ejjq2TWP2QT2kerFfXOkbV
	p8Z7vA==
X-Google-Smtp-Source: 
 AGHT+IFy0RXuexUHa3kfjyL19mkyb1nAWPyntJqslgT3n34npIvK+M2FW7T6Phh8AkNO/tuBPU7EAg==
X-Received: by 2002:a05:6000:1886:b0:385:db39:2cf with SMTP id
 ffacd0b85a97d-38db48b9e8emr4235856f8f.12.1738839283504;
        Thu, 06 Feb 2025 02:54:43 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1d::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd35e9esm1390719f8f.25.2025.02.06.02.54.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:42 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 05/26] rqspinlock: Add rqspinlock.h header
Date: Thu,  6 Feb 2025 02:54:13 -0800
Message-ID: <20250206105435.2159977-6-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2297; h=from:subject;
 bh=AVuKKIzjLnum/Q8Jp3BllOKRS5yuVDUgY1U4pKev08I=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkAUSLUoqJ61KHJqVww5VkHo5XuLe/f7xDpkMh
 vShNwMCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RyoE9EA
 ChWskn+IPkSTn/DMBwsfvx6EWqIgzdLCOdPnNCPTWMG8F+qUgNrWq+hnY2AEcF0ceG8KE69wHJ0cLI
 VBN9DG6qMLro8x1voPjPdK+aL1HAKDexxzJsUGqXzwcJ7tvBu+6aJto1dqMnI/iAjErkVXhTfeeDnQ
 HE8Vc8ft+nREngcb+k2A3S+vnI7cNgZyEtUXn3mX+4C1e0siL9PCKWRbCj7Dd8CIyhr+I6iVZdwGNy
 fYfSZBupBuq5C4yvcYeJ8shIaV3bkO04VogVyYLEi/QVP8pKyI8R+EBtKWqigsXI6383C8YLtrbQGd
 6GU/QIqvFWawk/4RAnRUbHrfbE0pEX9aYKhgNEqv6/kqUCtYlJg+UsJtP63tthsVPkwsBYSzNyoHbB
 0DutHelI8e77l5BQjjBxlxrXXTwKwflGFaPet1F3lFUu5Cq2PuJNvOGpz9xnFAi/8jxZY9QaxMKAUg
 Xr5iqvWwjk83bjYdoRMl/jswMGFWaFmpSrDUvrXM43JJVq/Jip610cxzOu92F+nVn1lWFbMEC4Bpva
 mhfxqax6cnPG6vhnjpN4uowxGnWFTJUwdYsICJWIXeHdbiXs3WJOtEvcwisb0n3r1F6T8NtLM+7Mdu
 KrBmyKQ6ZIowge2wnwHfNovFg8fEt4LhwMndrwjiFM78K91D7w+iuxRUPVLw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Let's also type alias qspinlock to rqspinlock_t to ensure consistent use
of the new lock type. We want to remove dependence on the qspinlock type
in later patches as we need to provide a test-and-set fallback, hence
begin abstracting away from now onwards.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++
 kernel/locking/rqspinlock.c      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 include/asm-generic/rqspinlock.h

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
new file mode 100644
index 000000000000..54860b519571
--- /dev/null
+++ b/include/asm-generic/rqspinlock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __ASM_GENERIC_RQSPINLOCK_H
+#define __ASM_GENERIC_RQSPINLOCK_H
+
+#include <linux/types.h>
+
+struct qspinlock;
+typedef struct qspinlock rqspinlock_t;
+
+extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+
+#endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index caaa7c9bbc79..18eb9ef3e908 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -23,6 +23,7 @@
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
+#include <asm/rqspinlock.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -127,7 +128,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	u32 old, tail;

From patchwork Thu Feb  6 10:54:14 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962812
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64CFA229B0F;
	Thu,  6 Feb 2025 10:54:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839289; cv=none;
 b=W6DDvZ5umehLa+U8DX2FwS/4HGQZhNfFTFkSAeGC8vs+kYGjZMsgsQ4hmk5GqR5kwDhcHxeT4IW4mUALRkpkWLCcURc85/R+W13advrbuRbALLEItZ4fBa0rPmlOcWT9+k5SC/YPFOhxQ4qIfXv4fpxjKrQgHFhbjviFuvefQD0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839289; c=relaxed/simple;
	bh=r7k8qGdKN/3/qwxHoOfh+ZQmucFzyerAvWUxzRBa3Nw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=lGcDIVTCZaXmMWzH+ID5pC0KbLrZKX3jdC4tRheFiPakFGyCa9z76JM3+HY6BXqZLoGpVpxuEm1pNUPKYHazrQbyUosubtGVMzGtqndbyFkQIh4yLziLd06DZ35SVQ+tKL4/3rg/FamRow2m9ekgO0SZjhoEKwSo/QGUS4w7DC0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=gEwP2l9P; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="gEwP2l9P"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4361815b96cso4768575e9.1;
        Thu, 06 Feb 2025 02:54:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839285; x=1739444085;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=FML/jIvPcAoxlA62EsuMUSBptbx5cKUmte0tA2musTk=;
        b=gEwP2l9P7GKiGA5//AffMt8z4ionJ+pnOF6hktdb2ho5yxXB9qGNAJmDQjvczn8lhT
         Foqogue+a5VcVSaOaUICnlpQ4kMguNR4akiW6KPf2vvsss6q6fLgVFBBMF3ViFdqp4VW
         r5qZ89k5TcQA36NmYpJgpbAw6w7TH5yi91e3N6quI+VpOao6jYOXYHyJBui17ageYGkE
         FJiw43Et/m+LCEkoDIraRMcQzYvq+FgALM8smkry5/+OXyHodunWcQqJqMRHy76Z4Gcx
         jwc1otatVcnLTR36Osw6c9ew3XLTG6IgJu660eHnm9bkE4kuYK0FMurrQCscv0osfbwe
         LZJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839285; x=1739444085;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=FML/jIvPcAoxlA62EsuMUSBptbx5cKUmte0tA2musTk=;
        b=XKxtVDd/1STs9UlpvP4CYUtglFsyp5Dc5SSn8LCNFp4ii0TsYZBURqMoUntk1RsWpY
         TybMGeyiYMjPU9zUJiIF+F8bfWhCFtodDvdU2T3tfWIHxUZ3pKxWrV1BtHzb66d9YuaS
         FWyPIhG03nAwmcqNL0ggVSAuexSGzGsHtvdTkGuMU4ulwyVponjoQ673ENbNE5q9hsJb
         44FM3/V3GCaHG0Co3MNSxKrDYjDjISptSpv9T9eqqzxHb1EleQeSLGQ3q4yYJQ8rOCHo
         b/QGYbnj09CeyfHPRrowkUlPqkYfwknegEP4qWEiU1oj+Km3BGh4h3bazhUtJ2/rHz/W
         XAdA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWxSprz6xZQiXlSqFsB0eQjbeH5m37G9XAUDWvarOD62fCtFMoJk3ZbmhsYSRT1XesE/7HrU76n5CPKCoI=@vger.kernel.org
X-Gm-Message-State: AOJu0Ywe4rB6TBF084XVTG3m3cbJqQONe9OjiZBgHpPGWtUBtliKU4zv
	FN4oQ6yYVBEGkiwf5rP6jD7A/KAsOhamSwiTi8knT1kcf1qVLNvOmnXRbsXXdE8=
X-Gm-Gg: ASbGncu4qvsQDj1niVmjqKvcRSlO2DK37kffuXkMPEswTr2YhqtqX2lbBZbGZiYuZGn
	aNhvKluyLnrtoLKEaDNsBo4A3NRPAG/68JYkPI0kN+rbUGsCd/5JEJERiVF/h69CMppRSHG1BUy
	+dUmld2DC6siAzqb6YAmRifIYGUGOJ5K7mAyW9FJAtBhxokiAGse5/43cmrKerezRfVDfO3Kkr/
	3C+do0WlbDaFEFLi+j4dE2q71jGB00D3Q+1Zf1KugRAv41rxXc5w3D3MUYsNkMFE29s/EneGyF8
	XODrgA==
X-Google-Smtp-Source: 
 AGHT+IEOmTjECXHJOHu9cs/SJ+6ticeS7+QaeLa0eVa8SFVT2qBWkqUoeVUR/OVwzkrcdhv+Xm5HzQ==
X-Received: by 2002:a05:600c:1c90:b0:434:f270:a513 with SMTP id
 5b1f17b1804b1-4390d56e3admr51732625e9.29.1738839284871;
        Thu, 06 Feb 2025 02:54:44 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:17::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43907f6741esm44971815e9.3.2025.02.06.02.54.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:44 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 06/26] rqspinlock: Drop PV and virtualization
 support
Date: Thu,  6 Feb 2025 02:54:14 -0800
Message-ID: <20250206105435.2159977-7-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6325; h=from:subject;
 bh=r7k8qGdKN/3/qwxHoOfh+ZQmucFzyerAvWUxzRBa3Nw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkILe/Rfrs9nCkS/jAFdrxKRcYv7G3/iAaCR9O
 5XUAiaGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8Ryt6cD/
 9T4mqw3kZqs2TP+tHKQuzgqUq8fvSH8yl7t36bQGme497vOAOYDUHMeNE4NNyj7xPtcdwP+75JT6jQ
 wb7DTCpWkXZuXyGsVYOqkRCTrIpl86UhY4KJl7kpw0Fu/l4dVh2eGgHzXhYTOAo3UfEW3sblt0q+J+
 HtBSelQEJJ2OiEIXdozIXjKe1DouxA6jfr9ixQF5KRP3O4K0H2jeTFQw6ruH2RXH0V+42ZwyL/q4Sb
 j0fby9n8kP9ZDxMNUNWPVhnFWuMwb3b9rwsQZME31GLbIEi/IEz54iXXAjlwVeS8CtY3ZuiDqfn4LC
 8vkJ/5biap6lg9ReRc9H8WCmVZuC1O18jeYfRVCk9BRgjFDmynobDC5PWqUEaxs/4weWUTipElhtHA
 7rTqLuMsOilmIdBqGSY9cpHXiMj/9tfMraqZKztusV0dkACiFyAXpSgpqErFXM4z8J7b6tx0wu+gza
 wtTNXet8pwRgRUHVnbyGX81YgMCL5AQXwiOot10sXBM00IBvoKOrdjO2422YWhUOHy6sF6R5id4V9D
 gM4HflpQhz5PAIg3z56BGX8bWfOlfbWkA6/HEA5p+yyTD21iW/w4wyZuPUEWaHIhxlVrp81m7Ajv+d
 8KZJHw5wvIc5BbaAV1NGrm4shlGp+OwQG7OPQO17A7Vq6h8oz8Q8NJUJBu4w==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well.  Therefore, drop it for now.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 89 -------------------------------------
 1 file changed, 89 deletions(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 18eb9ef3e908..52db60cd9691 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -11,8 +11,6 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
-
 #include <linux/smp.h>
 #include <linux/bug.h>
 #include <linux/cpumask.h>
@@ -75,38 +73,9 @@
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
- *
- * PV doubles the storage and uses the second cacheline for PV state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
-/*
- * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
- * for all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
-					   struct mcs_spinlock *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-					   struct mcs_spinlock *node) { }
-static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
-						   struct mcs_spinlock *node)
-						   { return 0; }
-
-#define pv_enabled()		false
-
-#define pv_init_node		__pv_init_node
-#define pv_wait_node		__pv_wait_node
-#define pv_kick_node		__pv_kick_node
-#define pv_wait_head_or_lock	__pv_wait_head_or_lock
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
-
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-	if (pv_enabled())
-		goto pv_queue;
-
-	if (virt_spin_lock(lock))
-		return;
-
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
@@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
-pv_queue:
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	node->locked = 0;
 	node->next = NULL;
-	pv_init_node(node);
 
 	/*
 	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);
 
 		/*
@@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
-	 *
-	 * The PV pv_wait_head_or_lock function, if active, will acquire
-	 * the lock and return a non-zero value. So we have to skip the
-	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
-	 * been designated yet, there is no way for the locked value to become
-	 * _Q_SLOW_VAL. So both the set_locked() and the
-	 * atomic_cmpxchg_relaxed() calls will be safe.
-	 *
-	 * If PV isn't active, 0 will be returned instead.
-	 *
 	 */
-	if ((val = pv_wait_head_or_lock(lock, node)))
-		goto locked;
-
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
 
-locked:
 	/*
 	 * claim the lock:
 	 *
@@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 
 	/*
-	 * In the PV case we might already have _Q_LOCKED_VAL set, because
-	 * of lock stealing; therefore we must also allow:
-	 *
-	 * n,0,1 -> 0,0,1
-	 *
 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
 	 *       above wait condition, therefore any concurrent setting of
 	 *       PENDING will make the uncontended transition fail.
@@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
 	arch_mcs_spin_unlock_contended(&next->locked);
-	pv_kick_node(lock, next);
 
 release:
 	trace_contention_end(lock, 0);
@@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	__this_cpu_dec(qnodes[0].mcs.count);
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-
-/*
- * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef  pv_enabled
-#define pv_enabled()	true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#undef  resilient_queued_spin_lock_slowpath
-#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
-
-#include "qspinlock_paravirt.h"
-#include "rqspinlock.c"
-
-bool nopvspin;
-static __init int parse_nopvspin(char *arg)
-{
-	nopvspin = true;
-	return 0;
-}
-early_param("nopvspin", parse_nopvspin);
-#endif

From patchwork Thu Feb  6 10:54:15 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962813
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7426322D4C9;
	Thu,  6 Feb 2025 10:54:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839290; cv=none;
 b=Q7OgrT988aWQIDg4nGl2NVRsv+2XuaWo7ui1G4UApV1SBtVcSNrmpsKK63klScy7jKzHabiXeMeHPA5tFpG/VgzWCPcVgxn+ZpkOqIBzlHyncoES8SHPcLaAYqrkZf7TD00n0vKGpyViNIZXXs+hMiLMNj4ilCDuS8qXKpMJq70=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839290; c=relaxed/simple;
	bh=+CI7RDk98Onq/s+jrg9zfcbapKjfoHyX6Gf/cOAFD9E=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=aRLZkVqPoRWZIB2DoS3K8nlpGHdxxwwxph3beZyIVOwQBknOU89+LcbGWBzY2QNn8VGjq6JEo/fyf64iwV/37bjR6ATeKAJ3iOSiFTZBjwDL+WyFXFazXEugIX7DHbTqPehym25wgN3g6beGhZofpmTiSJdjhAjHgCZK/cUfx+I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=UJpMS/J7; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="UJpMS/J7"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43621d27adeso4637535e9.2;
        Thu, 06 Feb 2025 02:54:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839286; x=1739444086;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Vg5Aj/VI1QlQmysdMT47W3wgHFqoKVdLw8Vt35/615c=;
        b=UJpMS/J7uhBwYNK5Fk9Mm+HrFEyvW6k3QUe3YMoyJ4f2WgQi8Kdik6AIssuhT5St1l
         5nm7SYdpfAuYSFJtGkAd/k3TTzborYKdMOpohIVb8wVLbPhlKUHSh9fUtA4s8Q5aflEz
         9P/odUuETv942Z0KrAVlVHzZWIPJQsWCxMbfxaczmQf4LajcAhjPuLtkV7X0m7FbFL5+
         xfOUJtbao4UL7ULo11TO0/9Zm84vmdki54HR8Nio13lJyzlJr78zswbm88Os3Huacp5D
         jR9Uja8BLq/ZPN4CUW0MRjEHsSfwUXoBnJXY+BXD3F0dINGn0XRtc8EniiN1S5qg1Uol
         NeLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839286; x=1739444086;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Vg5Aj/VI1QlQmysdMT47W3wgHFqoKVdLw8Vt35/615c=;
        b=QdzAWHH1l2qFgAKuRLE7LzeBSw07xLz6yN+bIV7QCAetzRre3sqhuXFeEf8vQmlbaz
         zLF7cgOkYO1FWfJ1/AEaSs2XvFxci1lXMC7epu8f2Vey2GI31/xx6qbsDpS82r00rdL5
         FW367L3PY+JI0UUuEdqrql2N63YFCdKe35UukQQ+5AyY+fn4xi5A2s4l2EIlIaGV0CnK
         xC+sDIEVsiqgA1s8fNs7hgzU+jz8Y8SPG9En59hogJgZNtV2+MVkgEg7OREQXACtGv+1
         M+COWdk/012QmntK34SN+0b2Vu9WrZ78oSs2M3d0+WayYdMDrkGmP1szCNJuvnRGSTe0
         ODfg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWyaKviVuEBos/PyWdtxPWR06l04+qVBPz5VgLnKzvmph8YsLZD1nBQR9tW63epcBevMvagOpQewewQtZE=@vger.kernel.org
X-Gm-Message-State: AOJu0YzQGQxxobir3JGmUvFfSz6rir8v8phLYoyY+RKZuyL/yxItZ3qw
	k0o8pmwx8sToQwC5j9qFtDeAkuTRlUjkxaFvdGp4YEjj/dWFRziDnwQSpsyCVTw=
X-Gm-Gg: ASbGnctI7qfmx+2kutYrZ17biecb/3AiBFpppK1E2KOMW33xAWk9xc0lxvvdxrqT+5k
	H8kP8fEnQhKnUiCNxk9i+MYer/ZB0FVu3lqwejYWcHGpYO5FFFzNDHOxNgNFDs+tNvR3IDsNX1z
	mKrkq7d908ycNgyuDSVTSYEDS6WlM8MLsCgsozh/ioYnPKDjPJVC6n1h2MVN9JBYddDmtmXAAnN
	grnhYmfgdVpLf8qQeO98Wq+QZlPde7sowAlgoBdHLxzIlf5zWwuXk2eUVTy1FOD2V5ULQqvzKLd
	Y4L2
X-Google-Smtp-Source: 
 AGHT+IHYw/0AC8066ElVuaB+bONKwChfk8mv0EyRa3QvFZYateFuU3S3W5RTQoK6xFXLzRpsK0lELg==
X-Received: by 2002:a5d:64ec:0:b0:38c:1281:260d with SMTP id
 ffacd0b85a97d-38db48d5ef1mr4800897f8f.31.1738839286065;
        Thu, 06 Feb 2025 02:54:46 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:f::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde1ddb9sm1383042f8f.84.2025.02.06.02.54.45
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:45 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 07/26] rqspinlock: Add support for timeouts
Date: Thu,  6 Feb 2025 02:54:15 -0800
Message-ID: <20250206105435.2159977-8-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5055; h=from:subject;
 bh=+CI7RDk98Onq/s+jrg9zfcbapKjfoHyX6Gf/cOAFD9E=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRko56eerusglxFABuTrjI0BeTNdWet/EFfKlxx
 n6KicXiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RypHfD/
 9HHaEjSUQc5CtopQiXE6c0Vwac7KpTMW4wwJycT3bCu1KquGtKAhLC2KFqVX2wHK4bl/OLK9vxVSmT
 Y9HJRDo6MOTwwlMcflPNZ3sTdgVaOaqei2zDvi0fcuAdVHriJxcP1KZX+BOtNaX4oIBqMixlOnWT8n
 zk9/HR/PwfXvbdewufYQssneftjxoWOvuQQeFEhZlkhohK0wkeQGgYZ6x6dpnN/VU3PpQQVTbRQFDI
 q2nwG3DsniPReJ/kfk+KKkvpMIXwuCUQCswIjrMd6X+f8Gbx4HkDaoCK0jNJmLXz7x/qnXM7efk5aJ
 eNUrFK4ZNtavQLPKEs7/u1ksDubcrsL+PcEjSLPmM7S83+t6sEMMKte4jA9A3EhlIe3pOMdshKIwMZ
 ox0u8m96KQo3KwGtgfkm1xYVf6D/WMtXXhPzrlPQ9no39v7Bzx0BPhb1CwvoxQHGMvECHciFWXf1E1
 lwy9QUGhK6aq/O7Wbf9dLtTyqUL22WhmoHXboRhy8E1Q5+h+jywgW08oMQ80fekxqIfrp7xeQuJxbs
 6tUwh0BODBW6hRLhfkjt3Qu89gY3T/J/7YxiR5wbSV5A47TQLxdDH3gvALoDmdy9OiVYWlvfv6LgUD
 KhAG4jecaMw2SpR61+5VnBT3O5WvSYhyuAOcyVQLr68OWpdNT5LRRVSCKKDg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop, that is passed down from the caller of the slow path
function.  Use the RES_INIT_TIMEOUT macro to initialize it. The default
timeout value defined in the header (RES_DEF_TIMEOUT) is 0.5 seconds.

This macro will be used as a condition for waiting loops in the slow
path.  Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  8 +++++-
 kernel/locking/rqspinlock.c      | 46 +++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 54860b519571..c89733cbe643 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -10,10 +10,16 @@
 #define __ASM_GENERIC_RQSPINLOCK_H
 
 #include <linux/types.h>
+#include <vdso/time64.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
-extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+/*
+ * Default timeout for waiting loops is 0.5 seconds
+ */
+#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
+
+extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 52db60cd9691..200454e9c636 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -6,9 +6,11 @@
  * (C) Copyright 2013-2014,2018 Red Hat, Inc.
  * (C) Copyright 2015 Intel Corp.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
  *
  * Authors: Waiman Long <longman@redhat.com>
  *          Peter Zijlstra <peterz@infradead.org>
+ *          Kumar Kartikeya Dwivedi <memxor@gmail.com>
  */
 
 #include <linux/smp.h>
@@ -22,6 +24,7 @@
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
+#include <linux/timekeeping.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -68,6 +71,44 @@
 
 #include "mcs_spinlock.h"
 
+struct rqspinlock_timeout {
+	u64 timeout_end;
+	u64 duration;
+	u16 spin;
+};
+
+static noinline int check_timeout(struct rqspinlock_timeout *ts)
+{
+	u64 time = ktime_get_mono_fast_ns();
+
+	if (!ts->timeout_end) {
+		ts->timeout_end = time + ts->duration;
+		return 0;
+	}
+
+	if (time > ts->timeout_end)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+#define RES_CHECK_TIMEOUT(ts, ret)                    \
+	({                                            \
+		if (!(ts).spin++)                     \
+			(ret) = check_timeout(&(ts)); \
+		(ret);                                \
+	})
+
+/*
+ * Initialize the 'duration' member with the chosen timeout.
+ */
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+
+/*
+ * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
+ */
+#define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -97,14 +138,17 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
+	struct rqspinlock_timeout ts;
 	u32 old, tail;
 	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	RES_INIT_TIMEOUT(ts, timeout);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.

From patchwork Thu Feb  6 10:54:16 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962814
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2F1522DF83;
	Thu,  6 Feb 2025 10:54:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839291; cv=none;
 b=fAWodnHGEGJMs/7gxa8Tt2T4lVj/GZCgviaW93TfxwruD6t8jpmrbLYOG8qpIC9d41o3KU3doJp5IsV3qFpv1Diuw0egGrXU7n7RfFC2RI34LdkCiPOgORbaYrRC2HriF7R7CN/qqdqLsLhZTijZO4y1BVZL42lmOn1yPFOsUYY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839291; c=relaxed/simple;
	bh=PZAstj2WMwdJ5zpDKugp3FElz+LV+4cvINkXq1LfzVc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=kIQ+Xjg6DZsYEKmFYRTMmkjgrqzJwFaAS8vNGC5ALvqiQb6I3ZTw1HKLKf/eckcMgcAxvWOusoAmdA0tzM2mZTqLIELbZNoQQSfaDNR9YRCMHW/CoKM3eVQVzjcSnOUpUAD6PG+IcWoxLlJc0RsIUyn2v79D2xr5iD6r4trVgoc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=DsIqNL3N; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="DsIqNL3N"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-436202dd730so4838265e9.2;
        Thu, 06 Feb 2025 02:54:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839288; x=1739444088;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=u6HfEz6FnLcnMW+Wa6mBmkJ5vtGaoNq/ogsTG9QK4tE=;
        b=DsIqNL3NHzc+TwaRhNpYtb5rVfOVljK8851/P1LBlARdh/p4Gwb5gIVQLlqK5nmvSg
         Rz/31tijGNQYnEz/jRT3bDsqCWJRlUk2y6Mh/xHhSe6AfajwVxD6UD+Cm0ntxrhynVgl
         11NFY9S0/aZ3Eo7yGz97L1yicF6HuF1XZsf+lQnoRzoQNvrKGMftP5jNn9+bON/9IkVq
         u1iK4QtxuRTG/RbCeW2KaLH1eRxearIAVUgQFYt/nzrVUGUxvon+AQWcScGgAKKiZ4sj
         YpGnS7DHJKhM44CAHtL8/sKRS3RrPHfMkFqGKo9wyvzqvTaheJ4KdyDOEShoOy3oIOR3
         HI2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839288; x=1739444088;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=u6HfEz6FnLcnMW+Wa6mBmkJ5vtGaoNq/ogsTG9QK4tE=;
        b=G4FFpPtB0dISMnDfqc1EQRvP4twCL8klNOWXMaJwbWh6ba0TQCoL5XYxTYimDvaGDf
         yo3sdgcBnr3Pqlm2zq1suNt3gyKhHTgAvPP/b0i/6NKmRMGkYZoWQmk0kCsE20CS+AUl
         1hxD3GTXzyJWd0LENQxFtcRxj13EET70yG2ncKuSAzRybal3Zg6u2Idb2VuoJslcpN85
         wueJjUs36zsSyGTsGZ3tGPMyshzqHTOqpxK5m8oVEqvq/GfIM4qTFNTHMQ9mECiS0h0K
         yPu1UlqMAv3fd2g47v/IBcWbMa2jkrOJL120SNZltk4AOIu8SPHusuZHeeRV9gjLUP/Z
         mHSg==
X-Forwarded-Encrypted: i=1;
 AJvYcCW90CIZJMnY01ruJOaTdlrLjdkUcLN3Ji9RSrpibV3JRZY9E+4PefY6fMYM3O7l5JRRnvLTVD82bDs6ou8=@vger.kernel.org
X-Gm-Message-State: AOJu0YxNinOHzSHfPgXsotrQMebc6Zulb8a/F7zMoAFbvrtnXiWdWWj8
	n2Apaa3/4UlQq7AXMNYGFfoxfs0PDYn1KU3LPxHu85R/X5b5ruGDP0hLzED5f1o=
X-Gm-Gg: ASbGnctErxvaH2ujtc7zt3e/NcDN+nzWy9T7OcluVijBCNpw2r75UvQ8sWQm7RGTMOk
	VHAHemYF8/bAG+S4gdMjmpBsVTul4ziKVXPptbGrGufILQISLboQJAbPNodTMbe2oWkROkBqlmu
	UfJaEm3xH9ItjJRTQSWMHDfzhpoQcoa7cczbr//8bRC8VFoGDNxQ9WA0+JXAWbIpQWGv1TNfIGE
	XWlOZ3cS68379ZApYc+6+qAWrxoSr7EYiZLupR03Tual/G8pNeBEr7NDYP/aqo6bsZ1aqrnjO/4
	hS0K9g==
X-Google-Smtp-Source: 
 AGHT+IEFTmIDnkApoCxo9XEdAou5ZGEMDlqk297HAU+HLuXK1+LhNaJhCM/rdUs7kriHnIXhGFI/oA==
X-Received: by 2002:a5d:5988:0:b0:38c:1362:41b5 with SMTP id
 ffacd0b85a97d-38db486108amr4754830f8f.6.1738839287460;
        Thu, 06 Feb 2025 02:54:47 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde0fd3csm1414640f8f.62.2025.02.06.02.54.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:46 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 08/26] rqspinlock: Protect pending bit owners from
 stalls
Date: Thu,  6 Feb 2025 02:54:16 -0800
Message-ID: <20250206105435.2159977-9-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4602; h=from:subject;
 bh=PZAstj2WMwdJ5zpDKugp3FElz+LV+4cvINkXq1LfzVc=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlQkqPybkJ7GSLq/QqBmQtgES+UrYcxxOebbfU
 QQitnoeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyrK9D/
 9TAnlSHxA1Q3uFCYcCdMIawjW1IJgTMIKG08AH5HflcEBfh12ex5nIURCNNtVmT8KNdznwDtmSyf9F
 Dz9E3Hb1omPeX7fZXLE7IFq92YLTS8ZhF954tYzs/ccHTnFSiRQUEYaPNN5BjAreKu7VsuNJTWWwh2
 Zk18PTGBgcCoITYY2U9kPpt8H0ss5SjjE/v472ug3Sxr+ikSfRFCNfqHQlM07gdJeffwM2YhPLmyrA
 pS/ZOZBqjyv6FJgGZvfHKwzMoadC1/BvbPU7eQpAJasZGBSDhGpgWqGWPnrkd1Xx+Qpo5SaJTQv2Bf
 FVE3fsLPC8zh8SYzTkjlRiUuMQDkkUeC71FmooULzLbfpkqhhr5IlQIpcVaKafW3bGY7gkNkxgbUcR
 ZjGTHp06O6I30+WbvFpEoQXIBKla/RBxSi2qHXXdZ5xlHp/f0BpJfujOwZr5R+QHWNX4h8hvTigyRu
 JMiZjgK3a/DfYdZdkC7+3+Y4IWzCH5XFsthPvV6ZgI+HUdIaJu6gLuw1LHLahy0sUtSdl3j3nbkdnN
 nk5039QdF/msE7FiR8sEyIhyL9JjjIcbswOwvAx+cPa7i/XoU9RVLyZ+toJUpPQsXNfR7MRq1YEGFC
 OwvNNjfgANI5dQmuZkKEQfmCLhErAm7WdD82gJqews3JIUZe0bapo/LeUq6A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

The pending bit is used to avoid queueing in case the lock is
uncontended, and has demonstrated benefits for the 2 contender scenario,
esp. on x86. In case the pending bit is acquired and we wait for the
locked bit to disappear, we may get stuck due to the lock owner not
making progress. Hence, this waiting loop must be protected with a
timeout check.

To perform a graceful recovery once we decide to abort our lock
acquisition attempt in this case, we must unset the pending bit since we
own it. All waiters undoing their changes and exiting gracefully allows
the lock word to be restored to the unlocked state once all participants
(owner, waiters) have been recovered, and the lock remains usable.
Hence, set the pending bit back to zero before returning to the caller.

Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
event statistics.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h  |  2 +-
 kernel/locking/lock_events_list.h |  5 +++++
 kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index c89733cbe643..0981162c8ac7 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -20,6 +20,6 @@ typedef struct qspinlock rqspinlock_t;
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
+extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..c5286249994d 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
 LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
 #endif /* CONFIG_QUEUED_SPINLOCKS */
 
+/*
+ * Locking events for Resilient Queued Spin Lock
+ */
+LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
+
 /*
  * Locking events for rwsem
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 200454e9c636..8e512feb37ce 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -138,12 +138,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout)
+int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	struct rqspinlock_timeout ts;
+	int idx, ret = 0;
 	u32 old, tail;
-	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
@@ -201,8 +201,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		smp_cond_load_acquire(&lock->locked, !VAL);
+	if (val & _Q_LOCKED_MASK) {
+		RES_RESET_TIMEOUT(ts);
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+	}
+
+	if (ret) {
+		/*
+		 * We waited for the locked bit to go back to 0, as the pending
+		 * waiter, but timed out. We need to clear the pending bit since
+		 * we own it. Once a stuck owner has been recovered, the lock
+		 * must be restored to a valid state, hence removing the pending
+		 * bit is necessary.
+		 *
+		 * *,1,* -> *,0,*
+		 */
+		clear_pending(lock);
+		lockevent_inc(rqspinlock_lock_timeout);
+		return ret;
+	}
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -211,7 +228,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	clear_pending_set_locked(lock);
 	lockevent_inc(lock_pending);
-	return;
+	return 0;
 
 	/*
 	 * End of pending bit optimistic spinning and beginning of MCS
@@ -362,5 +379,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
+	return 0;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Thu Feb  6 10:54:17 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962815
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF92C22E400;
	Thu,  6 Feb 2025 10:54:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839294; cv=none;
 b=AfXl1s64yyGaxbotGuZ88wfDSIZUpGFbFL3BtkYA1NQwXyZH4NRiGwKAZSOP5Lph/akYlcBrRxkuqr0AUu1wJLNZgGuUjrvLwmX9NU9Hw6oUSNAwU4ZCL8qzOwvhhUkrbkRWJ6tbSOvmM1LqPbyswrxKn2QUcQFRUi5GzM0vcbE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839294; c=relaxed/simple;
	bh=Xa/jCeO/bNAS2DABSLS2XeyE6NH1yv2QeubXVXtSfFg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=eDov2RN8BNGXChjiKVo4lYi+cM8DJThOb/xG/8fpazh61LbdPFaihPH9Fzrc0mSeuCe6aiCAkjwI0GbWVgK8Zhj1Ca1a9bqXHF6nfCkyWcF6wSUzCszdMDmm85wJVupP+aT1d1V3ufksYpJI4bUgZFpFsIoLTjPD2eih1y9Kbvo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MqRXKBxh; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MqRXKBxh"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43635796b48so4157305e9.0;
        Thu, 06 Feb 2025 02:54:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839289; x=1739444089;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iYidbmwxCEV+eSEjHIGui462v/Vxn8uyk3zJ0J48/Wg=;
        b=MqRXKBxhCkdnpp9DLKsTQVYPTCaGmSe6LwvI7z32hKZFbB5mY50tbWz2LzBGDEpjXh
         nDO0Iva3925X/DQrE0jWgO8j6obQWAYGBTvfuZQ07v3fDRCcaANULlot9/WTh0XdTtib
         sOTggTy0BGiUquHPnk5ceOyZaDLpvwBO4SWhtgb9R8FgJLiia3x8TaFJZ44v0u4QMOAY
         34q5HYkwIlB/4okZ0ElyKalO67GGT19/oe56EIER/KakL673CAC4x2jcmrMH1Zauync2
         vjWNpWbDro9R8fNe5dLGeiVWU3e8fPbQtojobnWbhhz01NBsxs2n5ikAOd+lHNmH+fUs
         BeiQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839289; x=1739444089;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=iYidbmwxCEV+eSEjHIGui462v/Vxn8uyk3zJ0J48/Wg=;
        b=A8c7oW68M9YB/1vuwwQCmVAVAreKDjK83WAtdL1WdKXT7TUEY0QB4s0j2CbGiYsV1O
         fvAGnlyT0QYDhheKF9T85+iQOQH5c+atOTyh72UHz8QdeBuMre29KivYHTE5+GXJ7Anv
         NxaHpmer3tDhhfzUbxj43ON457OLZQET6qHaeUoDG5EZmZkAX2srrAdHnRSi4xef9vxl
         28Lmhri2zLd9ddAxmfzd4x+LL3ZBYdIY/fpAWKBr6Gt13fO0T07YU+XRAlNQNudNnVcP
         jHDCJoKd4NpsYI3e6RkfkBuhMhcWfYUaTiCvpzc6c07Rnll5mS1ZsahLc2iuYx2793Fp
         pgXw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUj4mtgvVCbBD5B9Rb+ywxlbPxGFap+UC8E0RdMJBp7qvnXWEIBOJ1wTCHdTn9UXcdLA5Guc0k1uVWhnUw=@vger.kernel.org
X-Gm-Message-State: AOJu0YwfxFxZqMpfk4urWmoJ7xznaYq6tbgwYJ7vVQFRz3R8v0ieZ+W9
	x01PLovTi26ce6JxNm5DUeOL1ntK6tUX8m4PrPu5Ox8GfAIgz80ULQAwEfLxCJc=
X-Gm-Gg: ASbGncu5rsYMVdhrr3mdxv5QYoR5C6TkBqOQf6xIUwv5sCVdL19v4ac9NQ8GqYQtaOa
	TZh4MbqCp7cGEtVD0dHULhq/bPnlSOBxG7TQqpwvqFNti63MAJ1d8fjJ7JUPhmUqaOh6/ZzC0gN
	ASLOG0d+cnqbu2gIFm9uGbjKMF5b7o/tBj5jxDoT8Z6qivZcYt4CWlxb1C25x5d3fAtdrTHLi7q
	dxOoW1I6Pn7HvTWeczz3/Av0+y8n+QxUFoqpm2GPkywL7CBAaV51BXcQ4IKxDtNSIDGh09n4Zn2
	W5Zz0w==
X-Google-Smtp-Source: 
 AGHT+IFphip47QJW2PrpFsbKi02KtCDzlFv4DF2sCZj2+AejG4aG7KuSt5Jt87tvagiOv6cvGdW9aA==
X-Received: by 2002:a05:600c:a47:b0:436:e3ea:64dd with SMTP id
 5b1f17b1804b1-43912d3ef4bmr20857325e9.11.1738839289015;
        Thu, 06 Feb 2025 02:54:49 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:70::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd4df6bsm1429709f8f.39.2025.02.06.02.54.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:48 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue from
 stalls
Date: Thu,  6 Feb 2025 02:54:17 -0800
Message-ID: <20250206105435.2159977-10-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=7029; h=from:subject;
 bh=Xa/jCeO/bNAS2DABSLS2XeyE6NH1yv2QeubXVXtSfFg=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRl7An7/KiEUm98Ur9vnCib0zAC2FnG3ihYCJKd
 QcDong6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyrulD/
 0QF/IO1vH6JRLlfF5fdCoL2J3bq6TUizWVQKnJM58Ak3TLWy8SAYpkkznDpIqhFohSNctWXqvIZ+9p
 16sVoQkzVrUVZ7oFbcrGdJRcSkqMQc1LJCTJG+1+S0O458BwdjYHBxyoigb2bJxJ1AqoTakaX6rOFp
 q41csSMJragiD4b837bqZkq6Kcnt5NFp7XO/6Ca4/ZgmqpXuFA/o2zmajpZUM8v+bjVz/X16xlcuV6
 SBxSHt0aPLBsnLU1BoCQrQWdpFpl1mJe+oslsG53qb/RfuoKcTBhetjeceQB8Hit/6sBcswtT44WAR
 Cx69D/8D2KZO/Q4Lrg08OxeuZadzrtnc6jOHnXI5DGab+HO17FzujG+flkF5KdBYru82bFLsWr3W9W
 P6WH1ACcX5G6Ivg/ah/wpe8/6GW6+02j3XKd5r0tgiGp4JDq0CnMOs8Sfe+LmDPiEuo+1zg/17KnwP
 G2OnXHvHPsyjcj5T2kVQOok60uFZk2WMFKXgMR23OcgQ/3g4sugfLA3JBDlw+xL1XfUSYgGPaGyi7o
 YNfJ07vRVHBXylG06/Y/GytWzKZoeQxLMeb8CkChWXxgnS28C3OZY3G1thv2fdFQ1sG/nUDTHZxqbj
 n/6RKyGflImPIdWq7jQWqYNDTxpBkJyANG+96Y+NtplOkmtZPIZtX2ePJ0yQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Implement the wait queue cleanup algorithm for rqspinlock. There are
three forms of waiters in the original queued spin lock algorithm. The
first is the waiter which acquires the pending bit and spins on the lock
word without forming a wait queue. The second is the head waiter that is
the first waiter heading the wait queue. The third form is of all the
non-head waiters queued behind the head, waiting to be signalled through
their MCS node to overtake the responsibility of the head.

In this commit, we are concerned with the second and third kind. First,
we augment the waiting loop of the head of the wait queue with a
timeout. When this timeout happens, all waiters part of the wait queue
will abort their lock acquisition attempts. This happens in three steps.
First, the head breaks out of its loop waiting for pending and locked
bits to turn to 0, and non-head waiters break out of their MCS node spin
(more on that later). Next, every waiter (head or non-head) attempts to
check whether they are also the tail waiter, in such a case they attempt
to zero out the tail word and allow a new queue to be built up for this
lock. If they succeed, they have no one to signal next in the queue to
stop spinning. Otherwise, they signal the MCS node of the next waiter to
break out of its spin and try resetting the tail word back to 0. This
goes on until the tail waiter is found. In case of races, the new tail
will be responsible for performing the same task, as the old tail will
then fail to reset the tail word and wait for its next pointer to be
updated before it signals the new tail to do the same.

Lastly, all of these waiters release the rqnode and return to the
caller. This patch underscores the point that rqspinlock's timeout does
not apply to each waiter individually, and cannot be relied upon as an
upper bound. It is possible for the rqspinlock waiters to return early
from a failed lock acquisition attempt as soon as stalls are detected.

The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
race with a concurrent xchg and a non-head waiter linking its MCS node
to the head's MCS node through 'prev->next' assignment.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++---
 kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 3 deletions(-)
 create mode 100644 kernel/locking/rqspinlock.h

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 8e512feb37ce..fdc20157d0c9 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -77,6 +77,8 @@ struct rqspinlock_timeout {
 	u16 spin;
 };
 
+#define RES_TIMEOUT_VAL	2
+
 static noinline int check_timeout(struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
+		int val;
+
 		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		arch_mcs_spin_lock_contended(&node->locked);
+		val = arch_mcs_spin_lock_contended(&node->locked);
+		if (val == RES_TIMEOUT_VAL) {
+			ret = -EDEADLK;
+			goto waitq_timeout;
+		}
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
 	 */
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+	RES_RESET_TIMEOUT(ts);
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+				       RES_CHECK_TIMEOUT(ts, ret));
+
+waitq_timeout:
+	if (ret) {
+		/*
+		 * If the tail is still pointing to us, then we are the final waiter,
+		 * and are responsible for resetting the tail back to 0. Otherwise, if
+		 * the cmpxchg operation fails, we signal the next waiter to take exit
+		 * and try the same. For a waiter with tail node 'n':
+		 *
+		 * n,*,* -> 0,*,*
+		 *
+		 * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is
+		 * possible locked/pending bits keep changing and we see failures even
+		 * when we remain the head of wait queue. However, eventually,
+		 * pending bit owner will unset the pending bit, and new waiters
+		 * will queue behind us. This will leave the lock owner in
+		 * charge, and it will eventually either set locked bit to 0, or
+		 * leave it as 1, allowing us to make progress.
+		 */
+		if (!try_cmpxchg_tail(lock, tail, 0)) {
+			next = smp_cond_load_relaxed(&node->next, VAL);
+			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
+		}
+		lockevent_inc(rqspinlock_lock_timeout);
+		goto release;
+	}
 
 	/*
 	 * claim the lock:
@@ -379,6 +415,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h
new file mode 100644
index 000000000000..3cec3a0f2d7e
--- /dev/null
+++ b/kernel/locking/rqspinlock.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock defines
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __LINUX_RQSPINLOCK_H
+#define __LINUX_RQSPINLOCK_H
+
+#include "qspinlock.h"
+
+/*
+ * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value
+ * @lock: Pointer to queued spinlock structure
+ * @tail: The tail to compare against
+ * @new_tail: The new queue tail code word
+ * Return: Bool to indicate whether the cmpxchg operation succeeded
+ *
+ * This is used by the head of the wait queue to clean up the queue.
+ * Provides relaxed ordering, since observers only rely on initialized
+ * state of the node which was made visible through the xchg_tail operation,
+ * i.e. through the smp_wmb preceding xchg_tail.
+ *
+ * We avoid using 16-bit cmpxchg, which is not available on all architectures.
+ */
+static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		/*
+		 * Is the tail part we compare to already stale? Fail.
+		 */
+		if ((old & _Q_TAIL_MASK) != tail)
+			return false;
+		/*
+		 * Encode latest locked/pending state for new tail.
+		 */
+		new = (old & _Q_LOCKED_PENDING_MASK) | new_tail;
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return true;
+}
+
+#endif /* __LINUX_RQSPINLOCK_H */

From patchwork Thu Feb  6 10:54:18 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962816
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE8EC22B5A1;
	Thu,  6 Feb 2025 10:54:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839295; cv=none;
 b=oSYyFV0ah6s2KkVlvrM1QKEGQp3+9pF+XluAz+N4OAk7gJcwmbZyyjwS+Xn9lWorEmmXIkI5fcRyR3QRnDOw2ls0vtjH+ig7NkKa3PoB6VHK4NtUe2asv1TjYZcVaU49Lk4bfu4Gsa4Yx4D4JC5NSakVLk/ZMg75Dl8v79k/oxM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839295; c=relaxed/simple;
	bh=jiFbzaZcn62jcwbclsn6fFlBfAVbcfgq/5K/rfr88pM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=u1CP+gyzYOzS/kUzvfWs7YDLPmQoowbwEqRF94YPnUof5dbKYi5N00KLD62VVz11dWCE2l7G3/uit4/ya8jXylQBPSAIBzpQyWe6rHScxZi+1W7QlSvbbu/azWAELTJC42dm0REgbOlED8EkycTPtS/mQkRBbQciUVMEJx5q8nc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=UZVCKj3P; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="UZVCKj3P"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4361e89b6daso4507865e9.3;
        Thu, 06 Feb 2025 02:54:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839291; x=1739444091;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=vau+zNCewZiHW38t3Nl8HxSJyuVHZ3A5oWGeCWcGqfM=;
        b=UZVCKj3PsaZCa/LJ2SZJkiydtrRq8UnHR1yV/fcAJpgbsvLfHs5o6zMARakQI15vlh
         yvoN3y3Oh6YTYtLEIO7x+esHTsY8FZOxr95hNVmqTPC21zKYHFvfBdHtxUnvuiRpY2Z1
         Br4v9G8Dywl3oXba8P6bVLiwz4SuF2Cx7b9pCTXyzMzPDYVLGTfH6YOUrR8yQd6q1P9I
         lgAkdz6Q2wQRFgFwZR0aym6OlqDiYv4zmc2IdGL45CcobsGTtKtylaj0isMwYzpVRHQ5
         9xPdyc7euyRWYwPl0Zc8OwOHk2YmI8Vh0vd7KwPaBrWI8w1rNXpw4c5rq0d/M/nOXDPf
         Nfpg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839291; x=1739444091;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=vau+zNCewZiHW38t3Nl8HxSJyuVHZ3A5oWGeCWcGqfM=;
        b=EDKJh/PyrgC3ugsj1u1L34ETX6B+VN8Xcx6tt6R/U8wGTezIcHI6rBWzRlizb8rgkl
         SutEhHXUFZxjuReMCzeVzSj3fIzZvOwI/JHQiWw+0xJCXIU8jY5bgLZVffzfgq62KlQ1
         ZC7/+to2I1w20G1Hc4aSaJt2UF2UE4SyHR0wpwV78eeDlPE2tXbeROa7hP+7XD7GBknS
         VwmgOZLzFTT54a+y9pNCU9HpnSKo/TB4QJhKrJX+vBRdE7f4/ISryAAtlFwffir988Z4
         m49u1Y+n5kzw7VZiaXbfc/di2Gr9xKsyJc2hdj2u7MJ8dlbkyLpbHXIooWvCCqKv4RJX
         mj4A==
X-Forwarded-Encrypted: i=1;
 AJvYcCXd4gmiICdJarj9mKh8lQsxyZXmIuyoPtkSPKquhza8vcfsEH0aj/0I+YyEANTPW1shT9o4t4ulogbW7LA=@vger.kernel.org
X-Gm-Message-State: AOJu0YxPK9nPrHLuPDQIqUtnNBas5YxFmFVBrNf+JAk8a9Khu+wbu8/y
	nskCYD+1pzSDO0xGS1p2+Dd+M5SaB5UDW6OBFkywgSZnJ9i2yqJVwvVgaX6Bjgk=
X-Gm-Gg: ASbGncsYXPvOo4+tzmcQrgsZJ2XC7HUIl9Tb+8PZ40MA662X1w5Ad/JPRcZKuR4dRFP
	ykPA+mB/dk5wjXJ6j0tzuzjPPml/C/MB3c5jXVqKIqD/He+4Lg1lZojulcUqliMi4LJTKloi4U/
	E057CFIKckI34jE5rpeW1lpDlnl6rAmd6ME0Z6Bawi7OrOwvCQlWIajzyCioZm3YTCd3rl39l9c
	IRwvqQ00ggDl8+uD5dn6jRJvv0kmBPTX8mGV2tgTE1dYWzCThJzSPcTowXikck3i2Epq50GGETC
	otfP9Q==
X-Google-Smtp-Source: 
 AGHT+IH09CkYJ0UtkBe+a5qInFgvMXJIECzPffzqtSRnQ4/rOtucmd7AR3vCh/FlzDD0l/wp6T1GPQ==
X-Received: by 2002:a05:600c:3b11:b0:434:a7b6:10e9 with SMTP id
 5b1f17b1804b1-4390d43f76emr48688805e9.17.1738839290760;
        Thu, 06 Feb 2025 02:54:50 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:14::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4391da96640sm15993445e9.8.2025.02.06.02.54.49
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:49 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 10/26] rqspinlock: Protect waiters in trylock
 fallback from stalls
Date: Thu,  6 Feb 2025 02:54:18 -0800
Message-ID: <20250206105435.2159977-11-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1512; h=from:subject;
 bh=jiFbzaZcn62jcwbclsn6fFlBfAVbcfgq/5K/rfr88pM=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlAFWO9LNwjO/6CG81gcHcw9dLmIxeb4fj+/Fr
 WCxwUJyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RylLCEA
 DDo4P5cRJ8lPeqvkLpxQQ1B7QXy+KIgoUK7g8esRrYEGzu3/eXsNpmYAoOvt3pEFrK41Z912PBSwiZ
 i494ekqsWp8O2NUu2LAySNMiQoLle49dVCJVCS6XExiZjShBBUkEtEuCd6nUmFQI75F1VAg1iHmFJB
 Cfib9a+3IucRelAmkgBnQ/O2d/fQo0v+SD0awcgBflP6r/Lt63AmQA4aZ4sFZ6racNxenALC+V6MfK
 eW1pM/lb2MDwv9GUrhhA/tvKIXCGXkR+t6Ujo9S300ORDUjNyeubMZtLOCzHkmeb6HyI/laIez2iWD
 Ek0KD+qMhLwDY6ZzHwu0bVuubuavObYEZQjiTGON1QtULb4JSDF56lM2C2rOn45iYri4ihN0gLv3Zs
 2ViXlGn24tXGV6kxa8MyOwWHOShlj4BRY0MGSol1jU2UZbp2ztn5iujPHmtcZDFMbAAe8HF8eaAnJp
 EhjV0kpIyUMml54A8fDiGMgjt5jI0ciIUDnEYtzcDCC56XaCPRwRRYluCIYOk1O6sczyszhNNwEWsM
 +oYoNqL47WMzOJThezb9dy0n2ARWnKz5fdrzN1fZIqArDB6QziEdYH90bwROd3jhypRSPWfKyYUWg7
 y0WljjCeSXbVoHZmX0zVKWnH/l+jCp7pdBirkN9WK6Cr5eBJYTtvKNDwUP/A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

When we run out of maximum rqnodes, the original queued spin lock slow
path falls back to a try lock. In such a case, we are again susceptible
to stalls in case the lock owner fails to make progress. We use the
timeout as a fallback to break out of this loop and return to the
caller. This is a fallback for an extreme edge case, when on the same
CPU we run out of all 4 qnodes. When could this happen? We are in slow
path in task context, we get interrupted by an IRQ, which while in the
slow path gets interrupted by an NMI, whcih in the slow path gets
another nested NMI, which enters the slow path. All of the interruptions
happen after node->count++.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index fdc20157d0c9..df7adec59cec 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -255,8 +255,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
-		while (!queued_spin_trylock(lock))
+		RES_RESET_TIMEOUT(ts);
+		while (!queued_spin_trylock(lock)) {
+			if (RES_CHECK_TIMEOUT(ts, ret)) {
+				lockevent_inc(rqspinlock_lock_timeout);
+				break;
+			}
 			cpu_relax();
+		}
 		goto release;
 	}
 

From patchwork Thu Feb  6 10:54:19 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962817
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C405D22F16A;
	Thu,  6 Feb 2025 10:54:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839297; cv=none;
 b=sqzngBAnJ71ob9KM0x7/G5G2lEiuTLliijdZ2Bm3FkK+j6fdiqrcDS5iDk5dJP8BS88HmhRkJ+zuJ2L8RfxPeCxSaTFcv9Gvkp2ln/Boe/HChkPYdJTuJP/jVM8NdgJIlI5ONDJMf+3bkxXo2VpD3QMESHc2YgOZLJcIDVPLG+g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839297; c=relaxed/simple;
	bh=msNcOGVmrnAfRu3MV0az2zqDUq7IHfqHVuBu+GMrk4A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Ka0TmfdLE18JzcSxTMHt+f8k770caXmt1uVUUznZyksbTTo03qm3I7k43WxcqpPcaL7XTdc/elWO9YrMXTpDQ2h7FSZjhWoqQQWyCBIMkek7iK3Sv2O39i69VqCpCk+C0J4EPOMaMBAyhexlMH13gknRvmhCCzfb4Nmp+OadGVU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Ab9yPqLr; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Ab9yPqLr"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-436345cc17bso5141215e9.0;
        Thu, 06 Feb 2025 02:54:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839292; x=1739444092;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=8M3jGKVoMOaiDp/mgjTJ4k4swCXfL8TDoJbCV0k94ZM=;
        b=Ab9yPqLrvJWg8Joyz5ZMeQgk//ncJ7XjcY8OC7mq7lDmc8WX82pG5HWBxtkUyL+z8U
         Y2L+9kvjxR+7dlGAeMdY1VvmVpTmBc5j4QKzwIBNHnQUABGMS3szQMfw7XmBgOXCn1s+
         h/kwRK+CgDs8g2zFOxWciauwdtKmM7ODonL4SiuvzifuYs/PAH8N7klkcCGZv8070peK
         n/fIKNNJsHLNKStVFi1C57aCbXdz25QMXhGCFIQieVK9KPO6Z4IupThYLHM9SBeACIKu
         UrdKPvUqCvMx/qv6L9ts9Q6UhoQ9+o7A8ovkldEKvhLnPfMJvcYj7Gd8pVDwCrvUnBcg
         GxDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839292; x=1739444092;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=8M3jGKVoMOaiDp/mgjTJ4k4swCXfL8TDoJbCV0k94ZM=;
        b=r6kRqViWBgf5aGv4xXxjF0hEVNtWeCrxgKs1/87AA1Q7cS3gQkGNjJ/srhD6cmfI5E
         stXedqJj4B7uigNWJC9FUvlOQfhmKf6A+djESosWK78G9BlHsm/1qdq0UMrtE5nkYXkt
         bxcCGo6a8vJAeifsIcmT/iRB2ndqFev4GEdNOnFnznowD9o5eSpqInkPwOMwnLt99UgT
         kOwvQaxT5QdNaEoMnZ/YjpTK79MO3I+6AZFXM3d2t69p2TMrTVsPmn0BIvHZ6ExjXEn/
         3fBOyYBxNufr0CtQLQNoRi7nU//SZCzC5VNQ2SsRDJvQFJKl2uRE+MBRQrHvtj+Ci2/7
         urwA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXSXYm1dj+WSZWemoNPsqloo8zV72gv7x7jEkIbF4klr/cNyZO+bwxV+n9O29FYkVbXS+Dgx4/MT3/UrMA=@vger.kernel.org
X-Gm-Message-State: AOJu0YzWM2VL3L1I8YzykgV9qxKsZJG2DDon2Dw4ifKgaER6vIKcN8Ta
	gJSvuZr7SuV5qmH5Sna3Yb3HtzGuL5XSenkLzJselySfKGefrGjb75lV+4VUaAU=
X-Gm-Gg: ASbGnctaEUdoH5CHC23NoPv9+1sRlxXugwRL77zerv/NP+GNzPdqpwpt9Gnt4e2Oaar
	V5SzPLFPytjXUBSjD/NPmZn42nFWyWllStNhDJrVPP7DLN50kNXLfU/33fyRp8zwP1hjQhRP1C+
	QpphsYM3hZy2PmnVT3Ha1ouShiPCbD86/lHiU8wJAOiwYmqEugGM2PEzxqtuA5QjboALHe2a9Be
	o2CfA/alvMEih3/JTBDGJFcMw5wo4a4pAGH2ZcnRVbhrktSxqRNo6fb9rbJPLJlDN/ZFijVybwX
	8xu83w==
X-Google-Smtp-Source: 
 AGHT+IE6Ytf9lAGZg/32w5jdCCc5bYoMNRLBU5aCkHim0FkVJBaC7h0Wj/nvgFrS1eA2CBln15W8Xg==
X-Received: by 2002:a7b:cbce:0:b0:434:9499:9e87 with SMTP id
 5b1f17b1804b1-43912d37614mr17657335e9.25.1738839292380;
        Thu, 06 Feb 2025 02:54:52 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:73::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d964c7csm50548735e9.17.2025.02.06.02.54.51
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:51 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and
 recovery
Date: Thu,  6 Feb 2025 02:54:19 -0800
Message-ID: <20250206105435.2159977-12-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=15389; h=from:subject;
 bh=msNcOGVmrnAfRu3MV0az2zqDUq7IHfqHVuBu+GMrk4A=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlbGObmmTmVt1u04gEed3cMy7FmALO/uAVks+a
 TeeoBgaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyiK0EA
 CnsJbt64sQX0kL016RYhlMZNffg/GcHvfv3Z9oFIBHQcHLm5bLwHfs4H4NTMyGbVhYbcOgTPdLdE2D
 DthUBlKJzI3++9fzS21vMoTArLRc+cWJMJlPcldmQQN9GEWs+b/JWYeCkyYbCSaTMqfXWCi/2rknpd
 IiM1cB9cu/wfEfje5N5LFnbXnCHoMreNOieQzxB0QIh/i86hmD+ykd3UqiUv760Z68PHozD3TfRpuT
 a7PXTCg4c8cYvwNs7CUHE2MF8SKHu2yYK7VA64+d6gkAsPGSctiNYcK9x0UYRHp/PZlfIbUrKsJWYW
 tTrmTMXAB0cNc3BBtMolkvMRTrXuMvMxootdKJZ810qc6RYfElFL7pzLsXWo8oB4OGyn3Gau3po3VG
 AJYUxKpTKF4k2WW+F+TbGolt6W/7v1jo2OXUzoMs/LFeFEB5XB16UoTp4po/kxAuX3GqLvlMtWsZcQ
 SG7GVcmsGx9lvto9F6yVjMvevUytjuT4u6A0cwgTSC4pDTlJXyDoN7VwX0iuTqnKddyh943RII4MuK
 d+rkjWBQWNMkwlqTLUJ4UxOusNS50rJNbWBpHmlApS0Stlf0zCOKqKcIVM+XepT1nadbnyrUe/Bhy8
 FhajBZBdwockQWCjCt2Z1Mi6wiwkcY6rTQvd6J62f7BWXOkVhH+60sQr9qyQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

While the timeout logic provides guarantees for the waiter's forward
progress, the time until a stalling waiter unblocks can still be long.
The default timeout of 1/2 sec can be excessively long for some use
cases.  Additionally, custom timeouts may exacerbate recovery time.

Introduce logic to detect common cases of deadlocks and perform quicker
recovery. This is done by dividing the time from entry into the locking
slow path until the timeout into intervals of 1 ms. Then, after each
interval elapses, deadlock detection is performed, while also polling
the lock word to ensure we can quickly break out of the detection logic
and proceed with lock acquisition.

A 'held_locks' table is maintained per-CPU where the entry at the bottom
denotes a lock being waited for or already taken. Entries coming before
it denote locks that are already held. The current CPU's table can thus
be looked at to detect AA deadlocks. The tables from other CPUs can be
looked at to discover ABBA situations. Finally, when a matching entry
for the lock being taken on the current CPU is found on some other CPU,
a deadlock situation is detected. This function can take a long time,
therefore the lock word is constantly polled in each loop iteration to
ensure we can preempt detection and proceed with lock acquisition, using
the is_lock_released check.

We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
deadlock checks immediately to perform faster recovery.

Note: Extending lock word size by 4 bytes to record owner CPU can allow
faster detection for ABBA. It is typically the owner which participates
in a ABBA situation. However, to keep compatibility with existing lock
words in the kernel (struct qspinlock), and given deadlocks are a rare
event triggered by bugs, we choose to favor compatibility over faster
detection.

The release_held_lock_entry function requires an smp_wmb, while the
release store on unlock will provide the necessary ordering for us. Add
comments to document the subtleties of why this is correct. It is
possible for stores to be reordered still, but in the context of the
deadlock detection algorithm, a release barrier is sufficient and
needn't be stronger for unlock's case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  83 +++++++++++++-
 kernel/locking/rqspinlock.c      | 183 ++++++++++++++++++++++++++++---
 2 files changed, 252 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 0981162c8ac7..c1dbd25287a1 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -11,15 +11,96 @@
 
 #include <linux/types.h>
 #include <vdso/time64.h>
+#include <linux/percpu.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
+extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
+#define RES_NR_HELD 32
+
+struct rqspinlock_held {
+	int cnt;
+	void *locks[RES_NR_HELD];
+};
+
+DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static __always_inline void grab_held_lock_entry(void *lock)
+{
+	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
+
+	if (unlikely(cnt > RES_NR_HELD)) {
+		/* Still keep the inc so we decrement later. */
+		return;
+	}
+
+	/*
+	 * Implied compiler barrier in per-CPU operations; otherwise we can have
+	 * the compiler reorder inc with write to table, allowing interrupts to
+	 * overwrite and erase our write to the table (as on interrupt exit it
+	 * will be reset to NULL).
+	 */
+	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
+}
+
+/*
+ * It is possible to run into misdetection scenarios of AA deadlocks on the same
+ * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
+ * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
+ * logic to preserve right entries in the table would be to walk the array of
+ * held locks and swap and clear out-of-order entries, but that's too
+ * complicated and we don't have a compelling use case for out of order unlocking.
+ *
+ * Therefore, we simply don't support such cases and keep the logic simple here.
+ */
+static __always_inline void release_held_lock_entry(void)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto dec;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+dec:
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+	/*
+	 * This helper is invoked when we unwind upon failing to acquire the
+	 * lock. Unlike the unlock path which constitutes a release store after
+	 * we clear the entry, we need to emit a write barrier here. Otherwise,
+	 * we may have a situation as follows:
+	 *
+	 * <error> for lock B
+	 * release_held_lock_entry
+	 *
+	 * try_cmpxchg_acquire for lock A
+	 * grab_held_lock_entry
+	 *
+	 * Since these are attempts for different locks, no sequentiality is
+	 * guaranteed and reordering may occur such that dec, inc are done
+	 * before entry is overwritten. This permits a remote lock holder of
+	 * lock B to now observe it as being attempted on this CPU, and may lead
+	 * to misdetection.
+	 *
+	 * In case of unlock, we will always do a release on the lock word after
+	 * releasing the entry, ensuring that other CPUs cannot hold the lock
+	 * (and make conclusions about deadlocks) until the entry has been
+	 * cleared on the local CPU, preventing any anomalies. Reordering is
+	 * still possible there, but a remote CPU cannot observe a lock in our
+	 * table which it is already holding, since visibility entails our
+	 * release store for the said lock has not retired.
+	 *
+	 * We don't have a problem if the dec and WRITE_ONCE above get reordered
+	 * with each other, we either notice an empty NULL entry on top (if dec
+	 * succeeds WRITE_ONCE), or a potentially stale entry which cannot be
+	 * observed (if dec precedes WRITE_ONCE).
+	 */
+	smp_wmb();
+}
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index df7adec59cec..42e8a56534b6 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -30,6 +30,7 @@
  * Include queued spinlock definitions and statistics code
  */
 #include "qspinlock.h"
+#include "rqspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -74,16 +75,146 @@
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
+	u64 cur;
 	u16 spin;
 };
 
 #define RES_TIMEOUT_VAL	2
 
-static noinline int check_timeout(struct rqspinlock_timeout *ts)
+DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts)
+{
+	if (!(atomic_read_acquire(&lock->val) & (mask)))
+		return true;
+	return false;
+}
+
+static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask,
+				      struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int cnt = min(RES_NR_HELD, rqh->cnt);
+
+	/*
+	 * Return an error if we hold the lock we are attempting to acquire.
+	 * We'll iterate over max 32 locks; no need to do is_lock_released.
+	 */
+	for (int i = 0; i < cnt - 1; i++) {
+		if (rqh->locks[i] == lock)
+			return -EDEADLK;
+	}
+	return 0;
+}
+
+/*
+ * This focuses on the most common case of ABBA deadlocks (or ABBA involving
+ * more locks, which reduce to ABBA). This is not exhaustive, and we rely on
+ * timeouts as the final line of defense.
+ */
+static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
+					struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int rqh_cnt = min(RES_NR_HELD, rqh->cnt);
+	void *remote_lock;
+	int cpu;
+
+	/*
+	 * Find the CPU holding the lock that we want to acquire. If there is a
+	 * deadlock scenario, we will read a stable set on the remote CPU and
+	 * find the target. This would be a constant time operation instead of
+	 * O(NR_CPUS) if we could determine the owning CPU from a lock value, but
+	 * that requires increasing the size of the lock word.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu);
+		int real_cnt = READ_ONCE(rqh_cpu->cnt);
+		int cnt = min(RES_NR_HELD, real_cnt);
+
+		/*
+		 * Let's ensure to break out of this loop if the lock is available for
+		 * us to potentially acquire.
+		 */
+		if (is_lock_released(lock, mask, ts))
+			return 0;
+
+		/*
+		 * Skip ourselves, and CPUs whose count is less than 2, as they need at
+		 * least one held lock and one acquisition attempt (reflected as top
+		 * most entry) to participate in an ABBA deadlock.
+		 *
+		 * If cnt is more than RES_NR_HELD, it means the current lock being
+		 * acquired won't appear in the table, and other locks in the table are
+		 * already held, so we can't determine ABBA.
+		 */
+		if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD)
+			continue;
+
+		/*
+		 * Obtain the entry at the top, this corresponds to the lock the
+		 * remote CPU is attempting to acquire in a deadlock situation,
+		 * and would be one of the locks we hold on the current CPU.
+		 */
+		remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]);
+		/*
+		 * If it is NULL, we've raced and cannot determine a deadlock
+		 * conclusively, skip this CPU.
+		 */
+		if (!remote_lock)
+			continue;
+		/*
+		 * Find if the lock we're attempting to acquire is held by this CPU.
+		 * Don't consider the topmost entry, as that must be the latest lock
+		 * being held or acquired.  For a deadlock, the target CPU must also
+		 * attempt to acquire a lock we hold, so for this search only 'cnt - 1'
+		 * entries are important.
+		 */
+		for (int i = 0; i < cnt - 1; i++) {
+			if (READ_ONCE(rqh_cpu->locks[i]) != lock)
+				continue;
+			/*
+			 * We found our lock as held on the remote CPU.  Is the
+			 * acquisition attempt on the remote CPU for a lock held
+			 * by us?  If so, we have a deadlock situation, and need
+			 * to recover.
+			 */
+			for (int i = 0; i < rqh_cnt - 1; i++) {
+				if (rqh->locks[i] == remote_lock)
+					return -EDEADLK;
+			}
+			/*
+			 * Inconclusive; retry again later.
+			 */
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
+				   struct rqspinlock_timeout *ts)
+{
+	int ret;
+
+	ret = check_deadlock_AA(lock, mask, ts);
+	if (ret)
+		return ret;
+	ret = check_deadlock_ABBA(lock, mask, ts);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
+				  struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
+	u64 prev = ts->cur;
 
 	if (!ts->timeout_end) {
+		ts->cur = time;
 		ts->timeout_end = time + ts->duration;
 		return 0;
 	}
@@ -91,20 +222,30 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	if (time > ts->timeout_end)
 		return -ETIMEDOUT;
 
+	/*
+	 * A millisecond interval passed from last time? Trigger deadlock
+	 * checks.
+	 */
+	if (prev + NSEC_PER_MSEC < time) {
+		ts->cur = time;
+		return check_deadlock(lock, mask, ts);
+	}
+
 	return 0;
 }
 
-#define RES_CHECK_TIMEOUT(ts, ret)                    \
-	({                                            \
-		if (!(ts).spin++)                     \
-			(ret) = check_timeout(&(ts)); \
-		(ret);                                \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)                              \
+	({                                                            \
+		if (!(ts).spin++)                                     \
+			(ret) = check_timeout((lock), (mask), &(ts)); \
+		(ret);                                                \
 	})
 
 /*
  * Initialize the 'duration' member with the chosen timeout.
+ * Set spin member to 0 to trigger AA/ABBA checks immediately.
  */
-#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 0; (ts).duration = _timeout; })
 
 /*
  * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
@@ -192,6 +333,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 		goto queue;
 	}
 
+	/*
+	 * Grab an entry in the held locks array, to enable deadlock detection.
+	 */
+	grab_held_lock_entry(lock);
+
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
@@ -205,7 +351,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts);
-		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -220,7 +366,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 		 */
 		clear_pending(lock);
 		lockevent_inc(rqspinlock_lock_timeout);
-		return ret;
+		goto err_release_entry;
 	}
 
 	/*
@@ -238,6 +384,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
+	/*
+	 * Grab deadlock detection entry for the queue path.
+	 */
+	grab_held_lock_entry(lock);
+
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -257,9 +408,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 		lockevent_inc(lock_no_node);
 		RES_RESET_TIMEOUT(ts);
 		while (!queued_spin_trylock(lock)) {
-			if (RES_CHECK_TIMEOUT(ts, ret)) {
+			if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
 				lockevent_inc(rqspinlock_lock_timeout);
-				break;
+				goto err_release_node;
 			}
 			cpu_relax();
 		}
@@ -350,7 +501,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	RES_RESET_TIMEOUT(ts);
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-				       RES_CHECK_TIMEOUT(ts, ret));
+				       RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {
@@ -375,7 +526,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
 		}
 		lockevent_inc(rqspinlock_lock_timeout);
-		goto release;
+		goto err_release_node;
 	}
 
 	/*
@@ -422,5 +573,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
 	return ret;
+err_release_node:
+	trace_contention_end(lock, ret);
+	__this_cpu_dec(qnodes[0].mcs.count);
+err_release_entry:
+	release_held_lock_entry();
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Thu Feb  6 10:54:20 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962818
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40E6222F39D;
	Thu,  6 Feb 2025 10:54:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839298; cv=none;
 b=HskPDnwhDvHeebgRwtbCT9/HRlQ8tVvfF7EL3h3cxHhvDvXrb3ouOCnWyMSJsWZ88mFg2zB0SzAF8peIQoI4anN/abZ+9GAeJ1x9C5RH6/4riDNqOCoXWQLu1We0bN6dAK7eZZlOKsy0Exal/Urx/RiqlL+4E7t/NLJ/bNA8Ons=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839298; c=relaxed/simple;
	bh=nWMxFCMQuXMGccLr9BZ8Vd2VRz48kOmVrJhLP51E7zU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=rDexyDr/gBKcwjfVLpsOx1Bpv/Cz5l+0Ln1aNlL1yfrMThCJbrQuXcaMaRXJ89UR+Dtrhm996+3hazKC47l5mLwaSgvqmSdzIaJDaVGuZthNcq/FrwUGzS8BiRgWAeDXuJ7vzE/wIJSeogbu7tUu/vETlElw3Hoo3qq9kXCIYnk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=lKvykQx7; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="lKvykQx7"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-436326dcb1cso4874265e9.0;
        Thu, 06 Feb 2025 02:54:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839294; x=1739444094;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bppDtbpuPfpz/O7JpsRY6fQstRsmaQoxcIF1IN9rk84=;
        b=lKvykQx7hIDaQEaIawKB926FqTY6u6JFeoS/9InKxfBAWYofN3t9Ty/FmNFH8OqaFs
         4zvdlaMKUKqDlHNlNNcuU4sAI5Nc7oxF06VP8b9Sj8nALnZf/N3pOyj2bFK8lf3E2/kz
         ylRgZ0bCSPLYLa6McMuRpgHC248d8A99HcVzUP8bT703cbnoSF0DNTq0/CkxHgpZLU/P
         w8IacNrhgFyUXdw6ExFcmHGpChKOpNJTtqOSMe4Jooz20Q6/Dlv8i90J4je/AxC70VCB
         EcB2donrKHWk+fb16gqQ9wjIFL3eeJsrk6ubkndgGwW3TeZGW43cCIupAXqUC3iJ9jWw
         n6Ig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839294; x=1739444094;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bppDtbpuPfpz/O7JpsRY6fQstRsmaQoxcIF1IN9rk84=;
        b=cs3l2blPAyEl3kSmR+mQwgJIvZBP1wGaPDqGlZZbb+OOnwS18U4+avcLIFpYszK3nO
         A5HhyvHmAIK5GUIVq2vIOrP6yGay1hh9ZXuj3TaxZKAdL2fTnyIlU8mtfJOcUkB8aZyE
         LSEG/9xzB4OVQPC/vyU6ps87Ns/FblMULBKGNMINP76gOETELMV/VLaCwMjUEHA91s3L
         ZcoRbfvAFNW3oA/+RRx610oCA7BJhyQ+AM2dSew1mRczckSYNosLVqNNdshcF2gKhbL4
         ozL0jB60D29dMlbdlBize0B6oqEAUuhFAwys5XIX8sb1BdVgOk+ndD8Z7YiozUtZoq2F
         ZQDw==
X-Forwarded-Encrypted: i=1;
 AJvYcCX+/ZQvPthRiXoZW3P526VW+P6au0RNaNksp6EoX/QhsLvIHhvWeUgIKEiDcTK2Q2QN2eiZSS6nO9qxR9c=@vger.kernel.org
X-Gm-Message-State: AOJu0YzIT3D+88q6s3qZLTTkwEZQMYtNDUP8CL3iLNKRsqiX5/Mr7WNR
	V7oejxWsG97XuglhfpHGYKBRPVvg5s3n807/sXevKSgqlKqy+6xuILqEDBXxhx8=
X-Gm-Gg: ASbGnctqt8IA13/g38iI0GVNVfp8AwWnrUn5yAU0nRX41rYK2j6QoRaWCZYSmXhFbvu
	dVRJuXsrSmyu6w++tWrLZcy0A29Qozq0kBqHYfnS6r03wY15ZWmBqxzLIfZWqe0cGbHKQFqDSNe
	mu5KwpoL0q5k+mBHOfbwD57zUmK6owk2EFjz5DT0fWKniWkAEVIYuv1y1aDvdKMXOZNaappSp1l
	seT5Yqv+1JSM95oIkULe8Ib9zQBCHZDFR+QFp76VIjw0dK1JafFscK0R9q5Nz5Twdh9zc0+Y+PD
	3Ihk
X-Google-Smtp-Source: 
 AGHT+IH4Mj74Q6s6ESR+HQuLAWKyeMTzag9eq/JQXW9Gs5KkYpw8nRAczk2OFAZMAasQewSLnwMzRQ==
X-Received: by 2002:a05:600c:1c90:b0:434:f270:a513 with SMTP id
 5b1f17b1804b1-4390d56e3admr51736725e9.29.1738839293937;
        Thu, 06 Feb 2025 02:54:53 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:3::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd36346sm1391806f8f.27.2025.02.06.02.54.53
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:53 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 12/26] rqspinlock: Add a test-and-set fallback
Date: Thu,  6 Feb 2025 02:54:20 -0800
Message-ID: <20250206105435.2159977-13-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3427; h=from:subject;
 bh=nWMxFCMQuXMGccLr9BZ8Vd2VRz48kOmVrJhLP51E7zU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlMFpWa3g5n8jTl6bpUsPGHtx8MXyfjnnCKJt9
 AHs008SJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyvWnD/
 9ElDHw+LJcUwAOu/LB12XiFVDPKFpTcnqBAIIgWpttUZIU9wtEERsQL0IYyZVGNxKf6NjzmCa5aWjB
 6dG2OM5EHYXz75cwkHfgazZPzdvgoVeDK/X6Rt2/k5eFlhkvDpaZ1RJQQbPSvRy3B4iNrYvUxPRntH
 0siGNJGxXXk2/BF8QAsO0/9VyB0myTjnkvBLTYazkhPxYeGYJPOVUfjk9cl5Z2kz+zKGi7S7EVov5k
 EoVNvIcbjJp6aOpxZ0Cl81l9juNQk5dH73X0Y6yLqqcmxRePAdscK6ygQ0v6j5b5bZg1vt9NqzHfbx
 bcepMH3rFrZmAqXnSuel7pnTFM4tuaOV9SyNfh2zbTXiyWyQFXxPuGFx1+OLztjitTNulOf0KE+BUh
 HpLSN8XCqNmM10ErdfqRD80ZDXesuP0vqDokTCpTWKQXbVlnLiWGEwIOnyoE0LWBwqn/XvqhOIFrv4
 DYsmQzW+k7QJRAtKWsWYpWOHweuglwIcQeVSluJEnfGNkaIUyHbEU8X0V7jEnkdX5De3i1bERuidMP
 wM5fvNrUBt5JbXAuZjfP1QaTVtQYh+Rc0k6cDl6I0PUPAi5aAglDK3T8+HJtDTOjLvWN3st8/uhyl5
 rLuvZDv3ZySigXnWcS+rrjCZ06WAgFBD5LZbLsv4etjiyzSaJO4h2bT6JiIA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Include a test-and-set fallback when queued spinlock support is not
available. Introduce a rqspinlock type to act as a fallback when
qspinlock support is absent.

Include ifdef guards to ensure the slow path in this file is only
compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add
further logic to ensure fallback to the test-and-set implementation
when queued spinlock support is unavailable on an architecture.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 17 +++++++++++++++
 kernel/locking/rqspinlock.c      | 37 ++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index c1dbd25287a1..92e53b2aafb9 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -12,11 +12,28 @@
 #include <linux/types.h>
 #include <vdso/time64.h>
 #include <linux/percpu.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#endif
+
+struct rqspinlock {
+	union {
+		atomic_t val;
+		u32 locked;
+	};
+};
 
 struct qspinlock;
+#ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
+#else
+typedef struct rqspinlock rqspinlock_t;
+#endif
 
+extern int resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout);
+#ifdef CONFIG_QUEUED_SPINLOCKS
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
+#endif
 
 /*
  * Default timeout for waiting loops is 0.5 seconds
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 42e8a56534b6..ea034e80f855 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -21,7 +21,9 @@
 #include <linux/mutex.h>
 #include <linux/prefetch.h>
 #include <asm/byteorder.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include <asm/qspinlock.h>
+#endif
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
 #include <linux/timekeeping.h>
@@ -29,8 +31,10 @@
 /*
  * Include queued spinlock definitions and statistics code
  */
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include "qspinlock.h"
 #include "rqspinlock.h"
+#endif
 #include "qspinlock_stat.h"
 
 /*
@@ -252,6 +256,37 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
  */
 #define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
 
+/*
+ * Provide a test-and-set fallback for cases when queued spin lock support is
+ * absent from the architecture.
+ */
+int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout)
+{
+	struct rqspinlock_timeout ts;
+	int val, ret = 0;
+
+	RES_INIT_TIMEOUT(ts, timeout);
+	grab_held_lock_entry(lock);
+retry:
+	val = atomic_read(&lock->val);
+
+	if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) {
+		if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
+			lockevent_inc(rqspinlock_lock_timeout);
+			goto out;
+		}
+		cpu_relax();
+		goto retry;
+	}
+
+	return 0;
+out:
+	release_held_lock_entry();
+	return ret;
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -581,3 +616,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */

From patchwork Thu Feb  6 10:54:21 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962819
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4C4322F3BF;
	Thu,  6 Feb 2025 10:54:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839299; cv=none;
 b=pUtk56heu2d/nXAkL7Pug0/nybpo49raJVMNZeLpbQKcoP0QETVjQL5P9SJ4kTPz2pH2Jdbhp+wDnwAYSDdQ+QVXk8sHjtr3vB+1IT4aOn3ZL/PgzoB9qisSupQ18NWi9hCkq8AzFECwub5bjNTT+H8Hm39RXxQRPqRUWeJJCcg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839299; c=relaxed/simple;
	bh=zXQSTFNS/c2U8KM3FgC7kOLmnF5sm9qLW6KNbuMrArs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=L/plZx4nVXL/2eJ6PoG+dXUPDyusHId5hvhqn+0KbXBA3Hlo7LneZ0isKe0AMJDeAfEazTUU9oAy2dc1f3mtLD+yb7feJ525m/1eG+vd6iU7S4h2ACO4NkzpLXMH/EElPf27ImAyUPg5bRHLEU4kgzm4gGWomfdYuvNePphsx1E=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=RXwPRE7O; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="RXwPRE7O"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-4363ae65100so8239135e9.0;
        Thu, 06 Feb 2025 02:54:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839295; x=1739444095;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=2U/x1GfGP9j2cvu14gMbHcehw03jG1C+3eyWlyvy1zM=;
        b=RXwPRE7O6g8nlieebruXbtuU+4mpat4A4cQgaIb38tWx+9o6KIlBMjM2y4nR1Mayeo
         BG/VWyKVBivH8hVZ85bWKigoIBtwnfU1xJDLkF200Nqjie6pk15BcNCuCDOY++IeNAyQ
         qh+iXs99YEs05FPWHBG6yFx5P0rqVjJYXezxbwUjI/W7aOl2kcT0P4fTOGGYRLCVFENB
         bVKYr6PoRMvTNerw4L13fTI7CLerH56g4CkynWo8CBT1oi3sl/Qww3brRZia0xVd4Q61
         EMMYUMLxZgtLI+g6dVnFT2xPHidjkYNxkvnbirPT4to6bHtAkHXwI91rktrpsf7Tv1qZ
         0JUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839295; x=1739444095;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=2U/x1GfGP9j2cvu14gMbHcehw03jG1C+3eyWlyvy1zM=;
        b=tXctsKSZbzAf1J2JDztORLrKB061n/lgRe9uzaNVlK41JYALO6A26PCnaKMkJhleAE
         mNBr1rQcWMMqpVkrSV9rdNLQhmp4Tdkeh8KuRgQ4M0/JUIsTaNIfjFBswt06ya+QIXNE
         jDp7pT38ymLtMr5joCY8cXn+m1XmwJFq+XbcCff+Pb+of091yPEIqWtYcJDvDq97fLsT
         Jkw4bzrPQts7fT8unB2bClIVg8CE1r3h9ZA2f3dFGEXlBoNwrKkG7vbSP12L0dDV+gl6
         rGR9DW0jYaRxipiDpRRgP9ha1LE/SEC1caGl0EMf6x3sQciQQpIjEooAvStfzirMy/kf
         XuOw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUfUh6CVqQFaYeJhxopX4qpinFaH5QCf9pEdRCMcJLqiP5sWkBmJxi1R2R72SUXYLJ5adhjg47Rnol+VFw=@vger.kernel.org
X-Gm-Message-State: AOJu0YwdRqLxbktviDnvzh8rlNwWKWuq/qWtzbmBuehRgwECQXBuXaEK
	ObH6dGSTXnoipOZECPJCW1bkJcnraF16Ozb+zqzUhSjqeiRmAULSjL+AXpy+XoQ=
X-Gm-Gg: ASbGncuIgcmZYFkvuDwcB53Nj95oyRakiBtMxMHXIurVoVpc7gGeinGPmRFvB3lgvqQ
	tld5k0yLlfSjscyUTtx5TPCJ1jUi/ErOtrqYZWKllwFqXEhMqRfygQzKiLAqVsnxvxVhoe2mCeu
	HpIyAQIRmQZmGScbdmVSkRhOACuYe/+mbJz0i5B8JxE6qYO6P51blpBl3LMHcBP0Sq3vgEZZs7q
	KiQUQ5PB2SyUXOo0fwCokeQTqwgSoOeJCxUebvQhlmbO7tCc6w2SC/YSh9nlVSH5uV/aZutEEP7
	tqMF
X-Google-Smtp-Source: 
 AGHT+IECKDGAzq+Depq2+nIptoAD3ksbrPRr2mBSOELhwPjJa2kOGcthn5Lo7ZSMn0/ZXamyQQJB4g==
X-Received: by 2002:a05:600c:4f05:b0:434:f739:7cd9 with SMTP id
 5b1f17b1804b1-4390d4350c4mr57549575e9.9.1738839295278;
        Thu, 06 Feb 2025 02:54:55 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:8::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd7f081sm1429621f8f.58.2025.02.06.02.54.54
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:54 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 13/26] rqspinlock: Add basic support for
 CONFIG_PARAVIRT
Date: Thu,  6 Feb 2025 02:54:21 -0800
Message-ID: <20250206105435.2159977-14-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3266; h=from:subject;
 bh=zXQSTFNS/c2U8KM3FgC7kOLmnF5sm9qLW6KNbuMrArs=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlcUneWOnutzdUSHTyEeSH9ShV4gvRxjc/jId8
 1YMMoZSJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyojREA
 CAY3vlIDeTRuUpYBE+KFscZYPr9H8/q9UrVdKjYtlSmP4kUMnaYDdROM+/Z1eAE/ooLx/8vdb6tzBt
 fc3B+DxVL7tQO7jgkFhs8TWzKKlBa8uCE08YnUR183pRuI7FMMaaHbBakUz76eIN1rHwUUZnragSZK
 FI/5+wGEKE4O3AFSc0b6AVE2v7Ac5Qz5hBBsks/FANKm3Hdx0yJa1axdExsWkJREbmmLTO1RDjCoQZ
 MVoAxyPIBWrHsWFp3WUr3AGAwW/LT5cz1ADzDNURb97YC4QLdg4MG3NfjpdyWd2jj378mOBYDMydU7
 ifjlmuV1KLLhJCbrDsvImh8rxY4aiH1tOMPpfC/pCCARozVsCCt/0NqyneZEa6ebt3Dn6xqnWrSM0D
 P8kjr4HmLnUenqsX6gCzKA1pntH7LoAXkWLdXVX0Ekt+LIx9dUbZSsloW9AwF+9r/JRRMOBYnNcUDq
 YDOdMDyfNFfvCjHd9qxbMa/52CU9Tl7upwFr/RbcEK4zY0ZgsV0kmffC3TAPnB+MzzJZwwtdF0ajVp
 nFsUX1lkdCe1xJTlMFPMzPgqFnxoN8cIGLIKW7BMNq3lr6eU9cBn3XZGlQqrCRwdk/yLylSriVacdL
 F0y6Ga5GbeLS7YaM3SQtcTwKK5K5ODeXnTU57iP8D7BcG5/WmFu88d6x4oxw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

We ripped out PV and virtualization related bits from rqspinlock in an
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection. We can do this by simply depending on the
resilient_tas_spin_lock implementation from the previous patch.

We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/include/asm/rqspinlock.h | 29 +++++++++++++++++++++++++++++
 include/asm-generic/rqspinlock.h  | 14 ++++++++++++++
 kernel/locking/rqspinlock.c       |  3 +++
 3 files changed, 46 insertions(+)
 create mode 100644 arch/x86/include/asm/rqspinlock.h

diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..cbd65212c177
--- /dev/null
+++ b/arch/x86/include/asm/rqspinlock.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RQSPINLOCK_H
+#define _ASM_X86_RQSPINLOCK_H
+
+#include <asm/paravirt.h>
+
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+
+#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+       return static_branch_likely(&virt_spin_lock_key);
+}
+
+struct qspinlock;
+extern int resilient_tas_spin_lock(struct qspinlock *lock, u64 timeout);
+
+#define resilient_virt_spin_lock resilient_virt_spin_lock
+static inline int resilient_virt_spin_lock(struct qspinlock *lock, u64 timeout)
+{
+	return resilient_tas_spin_lock(lock, timeout);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_X86_RQSPINLOCK_H */
diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 92e53b2aafb9..bbe049dcf70d 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout);
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout);
 #endif
 
+#ifndef resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+	return false;
+}
+#endif
+
+#ifndef resilient_virt_spin_lock
+static __always_inline int resilient_virt_spin_lock(struct qspinlock *lock, u64 timeout)
+{
+	return 0;
+}
+#endif
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index ea034e80f855..13d1759c9353 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -325,6 +325,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	if (resilient_virt_spin_lock_enabled())
+		return resilient_virt_spin_lock(lock, timeout);
+
 	RES_INIT_TIMEOUT(ts, timeout);
 
 	/*

From patchwork Thu Feb  6 10:54:22 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962820
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A0CB22FDFB;
	Thu,  6 Feb 2025 10:54:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839301; cv=none;
 b=JfiLWHgtX07wp581g0xKzhBbHdQYRZJA6A4okXYgkl+bY9ojrbMvapei2EOy1UKYm8WeC8pfEQPkh98xTGsg/Kgo01MzmczDu64nOSK2XmQIqkkBbVwsFJPbOhlSyQ+XGjB+n58fcAn2tCnyNqa3plqtms6OKXIZUdfYEv42pPE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839301; c=relaxed/simple;
	bh=KNh5avXGN/NKqBUPvLsxJm4kx32ENpJMvZqaG5TMnX8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=vCFpnxRcEQ1g77VuAjMP4PdX5G+M0iHMnCgRQLPGvAoRoUB5NQyurupjBgmaSyq3C35Rhe2gs/D7/oMBfBV75BFw9fqvVZ+D/jqDzDXcfdqCgv1/d+b0GyvrwhlzHlzIc9loUxSPNZfAQ+qKI5rx4Z39YOKrfOzDCtD9240LNMU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=fBWdiLUu; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="fBWdiLUu"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-38633b5dbcfso681232f8f.2;
        Thu, 06 Feb 2025 02:54:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839297; x=1739444097;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=psIYVnoDnyqEhuF08gdBP5C/H1dDdxXA5if0bgYSupA=;
        b=fBWdiLUu7uh+kGgkufOXi21dQxwPUbb9iSZe6A5LSxxdv4jwL/o1us/UQztxsGba4A
         DyQfNkKwQQjVObIR7i6Nc95b94LPCYPTK8AO+WdQY9wG85Xv5jbm3lbzPR4oKUiTNS46
         goZgfusrczIcDvIYmBY6JaH4FlBIKQdw8GvPv2pUSZJ61vN60u+djNy8knmz/gMEQsWX
         ENKEPtP0frn2dIKtGsSdG2VfjKgPl5p/vqH6L3PlfFxckEnBmcIzvbdBfO4/4UoXOW0k
         CTl6x+0d0pKkndvPb4pVnkZrBxhoEdI0Nshq8Tj/Hvuck5fh24IUfYW9PYTMivWX8d7m
         2krA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839297; x=1739444097;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=psIYVnoDnyqEhuF08gdBP5C/H1dDdxXA5if0bgYSupA=;
        b=ksR1c4szVqWwDHohFWkWlsizHTvkblAt4x/s+KLheaWjht6gNb1MKD8Wgo+HWdOjAb
         CfK3wtcxkG74Eo4hQG9/XdtTo1a1828dC5rWyt1ubqWRLV/sjV/ybeyfb4r0mduL85Uu
         N0Y37gtbAkONTkpPP59+5pxb/9qpljvUxEwXaGywPJGQ1FRpIiDxzu5Ay7XnA+phrsD5
         F5gJMpiu1ffYMqGewUEYHKNRpyypmN+RSj6S74KA9mjki8WOlzVFCJvUwQnV+vjEJFSB
         y06uOrD9TewwBbwUb4sDT2ABaHVaPSqjvasf0t9QvJMieuMejKieYrkja9lFJ5r+7SfT
         TyGw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWDWKwRMk7CsRYCP2tV0L+d2/ZAYFvvvDCB21sINubCUs+gjnOENsImuWoBseCL2uvhNa7uCoiF5H8kSUc=@vger.kernel.org
X-Gm-Message-State: AOJu0YwR9hBHqeWCHmVWJcIN5tVHylgYpVE7OAxWCUvF6eY0KQ8STQ/x
	isuo0TI7A9XdmDv03Xv46/5o4YYvpReE81fsoiAjcEWS7cQpDm72uFjzk8DtwGw=
X-Gm-Gg: ASbGnctR5QL4eTz88RbtkzPbhlVis3pCOvnAiD2MFVLqDbBEBfXY0xHIqaqkY9uGJ4w
	ypAvZB2prJVJ4hMBQe3qdd09NP1YDP3WMs7sORX7beSxd0qyQ9OU/6DO1GfCLchFlhBfmPooE3K
	40K7O9T3IA4FvAuYCoiAgI62Dq8PD0FvG5hNvsmg6JV+TenIBon1uZBwKnukmI02yVIvKPl/ewK
	GFCnqBqfXErGYeEcZ+oWrZWb11OimEhOeFi4Wjacgj3p9Wy7d7t32aD5bXrfPIcbbYEkkf/lMBB
	3mAW0A==
X-Google-Smtp-Source: 
 AGHT+IHkl/GEUHcQkAbl0ON/0qgQ/ymyQ9ZF0yYfkBQadmGYwFrDVox65cPqH2Wp5hAeoi4jk/bRwg==
X-Received: by 2002:a5d:584f:0:b0:38d:b907:373a with SMTP id
 ffacd0b85a97d-38db9073983mr3633804f8f.18.1738839296552;
        Thu, 06 Feb 2025 02:54:56 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:14::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4391dfc8a4asm15277955e9.32.2025.02.06.02.54.55
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:56 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 14/26] rqspinlock: Add helper to print a splat on
 timeout or deadlock
Date: Thu,  6 Feb 2025 02:54:22 -0800
Message-ID: <20250206105435.2159977-15-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2125; h=from:subject;
 bh=KNh5avXGN/NKqBUPvLsxJm4kx32ENpJMvZqaG5TMnX8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmRlXvwB1hx8++BclgtcXSkYsvJvj0epYaDAgR
 caKTzLKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryj/lEA
 CdyUwdOkuj69uW28N+HkwHcaKyNpxRWOwDuMUKnremt3PTIi1hrZRLDZ5EgKKnQdyKZY6t2nVrL373
 lmuwFahN438dpl8Bk5tQbOhm6+4w+dE7fWr/DTpAuKuKYVZtr1/yiqqRdODy8wjoomvIzmS/in2A7C
 GF9o2x9dBdDcZ/k+DSqPjPQoebLQSloS3NlGu/AxxXu/YnaY7wvkbdiNj1GGZSKLX0Mf2zlShJbi46
 v7mgNoevomx5zGXDhazsdU2klCu0ipaQUiKQlAbHB4LDaCfTJiw/FKqtjY4bmTlnVienNuoZwdcWqE
 BqCt7T9md4IJP+jH9aw6PAXZYlPYe/DGYUuOtCberxn54FV6sgZnEH03zTBFmJXKMxHisr9eXNr1zy
 /ciQUgfbnR0+tgIbFm8LWRpA1Wy+nerC5gB6EHuAIp0GJW/NIs85l39vDGshVBpbkh3T/NgrQOVYRc
 A+kNcL2FSKEXxRtiVw9nzH+exDCaJdx9Fb30gd31OENMeXf/QgUBlRKVdcyt9rD/H5L/01hPbSpwTq
 AxHFQVwdMLBxGzxGfpTUCDZHXq1BCR+fw5VT6cC3LSVeBC6Y7lEDJ0PPAdEPbxJe7DzjHFblZdRlsq
 EJfnrO5HAk3JyRiiqL+AbJN/On7peWMsf6ZNiz+g4rLFY2+PreoqERHpbkvw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Whenever a timeout and a deadlock occurs, we would want to print a
message to the dmesg console, including the CPU where the event
occurred, the list of locks in the held locks table, and the stack trace
of the caller, which allows determining where exactly in the slow path
the waiter timed out or detected a deadlock.

Splats are limited to atmost one per-CPU during machine uptime, and a
lock is acquired to ensure that no interleaving occurs when a concurrent
set of CPUs conflict and enter a deadlock situation and start printing
data.

Later patches will use this to inspect return value of rqspinlock API
and then report a violation if necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 13d1759c9353..93f928bc4e9c 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -196,6 +196,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
 	return 0;
 }
 
+static DEFINE_PER_CPU(int, report_nest_cnt);
+static DEFINE_PER_CPU(bool, report_flag);
+static arch_spinlock_t report_lock;
+
+static void rqspinlock_report_violation(const char *s, void *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (this_cpu_inc_return(report_nest_cnt) != 1) {
+		this_cpu_dec(report_nest_cnt);
+		return;
+	}
+	if (this_cpu_read(report_flag))
+		goto end;
+	this_cpu_write(report_flag, true);
+	arch_spin_lock(&report_lock);
+
+	pr_err("CPU %d: %s", smp_processor_id(), s);
+	pr_info("Held locks: %d\n", rqh->cnt + 1);
+	pr_info("Held lock[%2d] = 0x%px\n", 0, lock);
+	for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+		pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]);
+	dump_stack();
+
+	arch_spin_unlock(&report_lock);
+end:
+	this_cpu_dec(report_nest_cnt);
+}
+
 static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
 				   struct rqspinlock_timeout *ts)
 {

From patchwork Thu Feb  6 10:54:23 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962821
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 642E322E400;
	Thu,  6 Feb 2025 10:55:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839302; cv=none;
 b=KsJELJGzsZg3Lp94ZkZeEYC5WUVyY708HqLGRFNUwsSxbFhMEfSu0kyr+tyLcW8+gst9l84yhO/UXYZ86mqEiIUKTmrpOwsQw/SQfP61vyV1tTNf0pktq9W+gPochMt2MgAmaPtetnRrR+5yokFwP78qpaBo5Smn4175DVd4Eyw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839302; c=relaxed/simple;
	bh=itMjJLB45N5MLJdgdgQDHOOS7VmFDUaHlV/BKezdr3U=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=DUXxM2wY8tnrwxHdSbqNhzZtcs2euyIvOpfKUnlRF6zwaU5bggm0xhCRjTOjXKCva0cvG0VaUE/CHyenn1Ey5mzwIR1gc4UV9ZbPFRj4EBic6S73JDk32qvALnIt7eu08hRUZsOYWBDqd+2yKJubJqyBcMYzWPK3QBXfLAwgu1U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MBk2H9hP; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MBk2H9hP"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-38da72cc47bso558595f8f.2;
        Thu, 06 Feb 2025 02:55:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839298; x=1739444098;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=0X1+UYLXLNf68smwsb01lX6eFqtHiz/IPso40v3JOxM=;
        b=MBk2H9hPtikCu9S4CJ1dc393fliGxfmwrKgiBA16IPei5Iqxnc/k8lsCXyw70xc93O
         M4tb+HdP8o6evN9tGF7g4bIJaIZmKxzbyb+yId5roUDQyieuTN5NSZnzy/pxrWyLp7I/
         JHHmOacm0xxME0fVbCDZBDfPL8G+6wn+ws6Xan1Q4/v5zm2Hwr3DX96WccDNchoA7969
         lWAt7CMDctFRuKnPihjFe7F6VIi2wnkXHjD9vFIdfsglBT0RiMLuLQFup0X+CvYji55L
         63MMy06QmDVzhQEBlqK8VxbTfRxl1vKwowtPS3n11+2iSBvoaHcrzhhpHXdLB1OGUZsN
         kgag==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839298; x=1739444098;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=0X1+UYLXLNf68smwsb01lX6eFqtHiz/IPso40v3JOxM=;
        b=Iv+xWJJ8D+1dqxP2YIrgJLKFfVHaZHhYOXA/485ycE+LBDS9Zt7BMLTJf33GgZ3vmH
         KPoy0CjUT2uYm9PbDQqiukE8RVBMD/tCfLzT/NRBn/nU6EHcy3h5/LxDh+ic3our6UF7
         4KJHTC3P8X4s0QzaPQHfLNjEsx5aysExytjL9G2naR2dVkI7nyAsDlpQ7S4/0UyWXfSc
         HZ5k68RZItEacn89rlDuzcRGTuqRSLiULMtli+H9XGk+qekd7Ejh7rLKoewX3HK7gtxH
         bghP1f21Aykn600tCyGFwa1jvMegpW/lcr2SZUpPxRe4QKL1kD+veMx0ip1VCEkV0aqU
         DsQg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXeXiAR0d4u1J7G2S29wFwLvp3t8Ltn3ht/apt1Ol+XSKhE1cK3/vxINJCqJ/Z5wv9AoC0+CYSUyMGw6Ps=@vger.kernel.org
X-Gm-Message-State: AOJu0YxnlPL1KYMU3ooWMszTcwAxEJJxCd2w1aul9iYlqaBczNVj+9bn
	ki2X06075b4qV61pYWPEhsCEvCnyP0PXSggcNjjCMAPLeRnrj/1iULnckjNMeKU=
X-Gm-Gg: ASbGncsQai+W4u/0CGlG7WhOLcFY5ctB8rwbymPC5WLCqIHfwVJSj86bNMFCgu1zkTn
	asttklAGomAp368am6Eshkiv2xonPR+4Sxd6dmva4e/2OFzcbxNjcKT55b4t9ut4bUX4ZDG4EF5
	6Inm4oHfhHIpmBjzK5h5E+TE/SkUTndghg3eVU6rIiF09o0UQnzq3yT104vvri9Tshpt48faDFu
	TxbEFzfJibC2HEd7gT1SjXui1qc51YBv4jDHTsy5BGyOZ5UX3CYqLUvEtls9rplFqAMA2xKbqBv
	m7e1lg==
X-Google-Smtp-Source: 
 AGHT+IHFPsmZFGzMQgNIZxyin0+PS7DDeJ1kbh/KGqhqHLUU4fvdAQOI2sLLSlmyZkzlKcZIDN7WyQ==
X-Received: by 2002:a5d:5222:0:b0:38d:b051:5a0e with SMTP id
 ffacd0b85a97d-38db495f8c3mr3602518f8f.49.1738839297924;
        Thu, 06 Feb 2025 02:54:57 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:16::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdd36776sm1432693f8f.32.2025.02.06.02.54.57
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:57 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 15/26] rqspinlock: Add macros for rqspinlock usage
Date: Thu,  6 Feb 2025 02:54:23 -0800
Message-ID: <20250206105435.2159977-16-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3270; h=from:subject;
 bh=itMjJLB45N5MLJdgdgQDHOOS7VmFDUaHlV/BKezdr3U=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmSdisG+iOEzxKLgDdJjXZN/mncOFg8pv5dO3d
 bZNJg+GJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyqPRD/
 9gG2FU1oHAJPUn4UJ3fwKsuQm4i95ajoiPlT/m5kT+DPF9gSa+xO4hKuGYliwEJvf+nuxUe7FMhCUo
 nbsTx4KiQesdJvFLBEyC8lMRvIh0qaD//aym3Fhb4D5zAgaUBULNbyDh5gj/mVvu5FLRbI+DT3LJmw
 r1bostFusx1GdFieKe+yhrTChSjrfqKbnpY9R/8w5UkGN/g+vLsyykEbwnbgHhL0Ycvk+yqbXaob++
 hWud3ejpfkcyQRqDBeycTn+q3cMN4T33wASR6VfBgNAggDQplTQ0NL97l+Dn3qdgtj1L8+PzWkyEy7
 uzYnLPsU1re215zb4P38apAspYJP0lf30MOhhmObWuxKLqtviREsn0uOBZNsmdc4R3mfydqsDy2K3i
 fXrNShGCY4YJFjLqUeWDnW9g2IHVA0Y3GNq49DhGgN8ZGExFHkkLHKfWWt1vd80tJx+3FcaAcbrHN6
 jqnPSeoZw9H/Gv0GIoXD05hjLkUhPcZYt3gM2eUbE0LjEPwU3A1PAq0320Z5j0ZxsuHnEQt80v5ElR
 1N2IG2VJNeILB6c4gNBsyn59IW7xvMo0y6wnEpgTgAJHBxYZqffi0Js1mYt58TEIo6iaH05Sdjgvz5
 U5PZ/7rUM15c0KeEcHBAwD6NKswh6LfL192Y9X7h+EChg4E+eQnBkpyXm7gw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce helper macros that wrap around the rqspinlock slow path and
provide an interface analogous to the raw_spin_lock API. Note that
in case of error conditions, preemption and IRQ disabling is
automatically unrolled before returning the error back to the caller.

Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback
to the test-and-set implementation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 71 ++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index bbe049dcf70d..46119fc768b8 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -134,4 +134,75 @@ static __always_inline void release_held_lock_entry(void)
 	smp_wmb();
 }
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
+/**
+ * res_spin_lock - acquire a queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline int res_spin_lock(rqspinlock_t *lock)
+{
+	int val = 0;
+
+	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
+		grab_held_lock_entry(lock);
+		return 0;
+	}
+	return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
+}
+
+#else
+
+#define res_spin_lock(lock) resilient_tas_spin_lock(lock, RES_DEF_TIMEOUT)
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+static __always_inline void res_spin_unlock(rqspinlock_t *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto unlock;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+unlock:
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+	/*
+	 * Release barrier, ensures correct ordering. See release_held_lock_entry
+	 * for details.  Perform release store instead of queued_spin_unlock,
+	 * since we use this function for test-and-set fallback as well. When we
+	 * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword.
+	 */
+	smp_store_release(&lock->locked, 0);
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; })
+#else
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; })
+#endif
+
+#define raw_res_spin_lock(lock)                    \
+	({                                         \
+		int __ret;                         \
+		preempt_disable();                 \
+		__ret = res_spin_lock(lock);	   \
+		if (__ret)                         \
+			preempt_enable();          \
+		__ret;                             \
+	})
+
+#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
+
+#define raw_res_spin_lock_irqsave(lock, flags)    \
+	({                                        \
+		int __ret;                        \
+		local_irq_save(flags);            \
+		__ret = raw_res_spin_lock(lock);  \
+		if (__ret)                        \
+			local_irq_restore(flags); \
+		__ret;                            \
+	})
+
+#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */

From patchwork Thu Feb  6 10:54:24 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962822
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66691230270;
	Thu,  6 Feb 2025 10:55:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839303; cv=none;
 b=kBY7wG1wrMnM9heoBEBrW8EwAUd8xShafnkGTCFSFv3c0zdqWs4ysUFx6IYa0HqwDu/bTVUtQ8L6HzsmtYEWiOrF0d35kwLOuRtVTB5dI00bPqEUZ91HnL+FDpXJyBx+SAQ+EJ1mOB1sx7Ngr0K/HMmgdhttd4gk0tlimmNXR2U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839303; c=relaxed/simple;
	bh=lkeMxyDGnEWVT4+0Paf/jS31tLo0YsbXyTnTYAfMTz8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=kAZ+NFgiQ4HPUFkUqVCcj49el2+uOli9pKgZmRa1w+0WG6nXVDazyKzkhMIesII5HFU48yRDghU6HQgg3yHRHaTgMbpjtl+BNWZhwGpAGV1gIMyEOTmZuQpLDZPwvX4Noy+3th8FkdZ6Ma1hHhAibD+8kKtSX+vBIZs2uJwj75g=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=AUwiQWGK; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="AUwiQWGK"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-38dbaae68a2so598307f8f.3;
        Thu, 06 Feb 2025 02:55:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839299; x=1739444099;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=v/27xKgFdSZVO2eLH/O6emznfiz3pieZcJIn5kKU89M=;
        b=AUwiQWGKHYnXNO5BoV419wqq5G+Viauj5tWbYjWWnXKiDsndyh7M/CrPbWQf0a8abP
         x++4EnNktkQ3l/zvc4oqVOLpdSZsDZ2ByjlV+8G9yJ7Y1D0CQVuBXgjCTWxBVKntO+j4
         mSpWEoZ9tmCUgw7NFVBZI29dP9xchkrhV7R/IX3EdXe2HOvHfBS8eHOKCOR3ZqqbQq9P
         HREWXRPOAGmAmNHl1cY95rYuFNljKFd8YZLBblVHtKhnQkgSsncoEH8mWZtGV0qtOJSr
         X2SWAEefwZCNya70k+Zihbxdvzoe/AhbOIOZ+RAkMz08WQ3QVBfFoqSDgx2AiwtDGI/D
         3LwQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839299; x=1739444099;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=v/27xKgFdSZVO2eLH/O6emznfiz3pieZcJIn5kKU89M=;
        b=Hjw8G6LQ6Ut/ONWr37xZQejIrHcU2PCc9XQvZSYknmxpEFrRN8o4ybJqiHnZkmk9Y6
         XgfjZQShwoaQoS/3C8XmJSncRXp3itGN8zZm0QcEIAVOtH0pULexky+TVZ728ZY1Uo5T
         VdUhbbz/BSHu/JHskmFzW2Gwe/Wd9kBgeX813Pkx0/ZfmxH4yoFafiXcsuvzViSMRWpq
         6FBUxyOeR9L5uyN1R5/6lEm3iAL11EjGRk+6fSRbIYxE0FnBOrKWW98baNFskZgVmBQn
         GEHKFgwKgvkuylg/970iZE/Jx7emImEWpXJUBstGghvf0s0ysxGRz6UzDSnLGFgcrOpo
         Aeug==
X-Forwarded-Encrypted: i=1;
 AJvYcCUIT0G6mH/dmX2TMq2lOpkO/UI3xe8ysBX3ar05hu6yBxCm5/dR4ZyoigR5/H+H7CPIRXNLBEEI7akVnhA=@vger.kernel.org
X-Gm-Message-State: AOJu0Yw0hRxakGQL3BG3donNbrbzrCJz+v2NRNAbSK8WNe/6ky/ZEZIw
	VYly6vJkExmMTuKUtrS2uiMrf0jrKr1o7YkxeuTzY8qnWFYkUienvZD6GoTGChQ=
X-Gm-Gg: ASbGncsFHa7USPN7082XEKa1V5Xkn0/ut8LTd1Yp5A2aLxh6YtZDspEortmeS006KMf
	djykXZV6ka9xvfJM0h7mqz1db2sDX4e1HlXAlaEd8JuuAxhUV9NhFSNDJbHohero3lJ6EaWxv7Q
	KKdCq9VxavFMpanw4aVBH4bZRR78H7weJtL97IqHFTq1iTqfsUMrJMZ0TT041JEmMsx1Ya0Nmsl
	euNVmtx7A6X0XuYfkwbjJs+mV7lPuyaE3G+wbpevrwGgPS8rN0gT7hKKQg15chdC74ZXDb13hcl
	e0Oc
X-Google-Smtp-Source: 
 AGHT+IEaxBrKjdya80s5GTFvJ53ssNGQMjtcN2JFoVjdVKyNSB8TPhmiwUVzhaT6kIG4/mzrT4zfQg==
X-Received: by 2002:adf:f9ce:0:b0:386:3835:9fec with SMTP id
 ffacd0b85a97d-38db492a155mr4449175f8f.44.1738839299377;
        Thu, 06 Feb 2025 02:54:59 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:4::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbdc30fbbsm1419486f8f.0.2025.02.06.02.54.58
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:54:58 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 16/26] rqspinlock: Add locktorture support
Date: Thu,  6 Feb 2025 02:54:24 -0800
Message-ID: <20250206105435.2159977-17-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2978; h=from:subject;
 bh=lkeMxyDGnEWVT4+0Paf/jS31tLo0YsbXyTnTYAfMTz8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmHQ2YyiZ2nANvO6JQtZoHC62f1PKXDNjawAxK
 mKA8GLiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyjR0D/
 9UxpQJ303fRMoYR6fqKrESf9af9KoWduUW16ytQ8cl8LUmJvwPsQA2RBgUKdWqIjIitbNFRK0ahuL2
 G3ZxiZ7lVlv6kVhyOprX9Rp9/1KXJb9AssTbXNvRGsblLw0VJWdbOxKA4CzRsqXHpyhAZw+crjWaAi
 e76bz8XD/UpVFVgPBoDM6XHvEU2CQz9W0Mujyrr8fIKId/Ly9v+BzlYwvJ5Dwiq0pKOjO1wJSs2GSe
 xJNqUgPJbtFFLsbyNGsWDMGgSm+PQL9V4lY/frjPS48Pxdu+urR1lOEEV933OLvUjCojgSyeFrQFi4
 C7ca0HI9k7+q8frWCz+Je2IqiCXf7yXlCIM2WkxiNzINim/wLLzaPdOeueZFq4MdMjMZBhCcUB1Rbe
 mRdSVBLs7X7Zpm/3jue63s7r1/T5Zdd+FnKnX9Mx1pbvmw8d+u28kL4LxsLFp3fBS3zeo5fKA1b74I
 aPnMh8oLweieJCNzHBtJmoA+8bROXjIbWCktLpZ6wtoxPmkgKNC65kw8ylP9OD9mtdz/bYC6djzAq0
 WjHL34MT7pIHjgbJe86Kbe4rqIKg2p+NIDSL06kM7XS11ju9ee0SrcJYHqw/rvOlMJqzWxf4Nv/Lcw
 tVGq7vh2z1O7XmMCOh+d4cORBxDlJ4pqO/LdJnm3CeCaaMqsWEXlmgSV6ZRA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce locktorture support for rqspinlock using the newly added
macros as the first in-kernel user and consumer.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/locktorture.c | 51 ++++++++++++++++++++++++++++++++++++
 kernel/locking/rqspinlock.c  |  1 +
 2 files changed, 52 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index cc33470f4de9..a055ff38d1f5 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -362,6 +362,56 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = {
 	.name		= "raw_spin_lock_irq"
 };
 
+#include <asm/rqspinlock.h>
+static rqspinlock_t rqspinlock;
+
+static int torture_raw_res_spin_write_lock(int tid __maybe_unused)
+{
+	raw_res_spin_lock(&rqspinlock);
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock(int tid __maybe_unused)
+{
+	raw_res_spin_unlock(&rqspinlock);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_ops = {
+	.writelock	= torture_raw_res_spin_write_lock,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock"
+};
+
+static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused)
+{
+	unsigned long flags;
+
+	raw_res_spin_lock_irqsave(&rqspinlock, flags);
+	cxt.cur_ops->flags = flags;
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused)
+{
+	raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_irq_ops = {
+	.writelock	= torture_raw_res_spin_write_lock_irq,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock_irq,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock_irq"
+};
+
 static DEFINE_RWLOCK(torture_rwlock);
 
 static int torture_rwlock_write_lock(int tid __maybe_unused)
@@ -1168,6 +1218,7 @@ static int __init lock_torture_init(void)
 		&lock_busted_ops,
 		&spin_lock_ops, &spin_lock_irq_ops,
 		&raw_spin_lock_ops, &raw_spin_lock_irq_ops,
+		&raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops,
 		&rw_lock_ops, &rw_lock_irq_ops,
 		&mutex_lock_ops,
 		&ww_mutex_lock_ops,
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 93f928bc4e9c..49b4f3c75a3e 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -86,6 +86,7 @@ struct rqspinlock_timeout {
 #define RES_TIMEOUT_VAL	2
 
 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+EXPORT_SYMBOL_GPL(rqspinlock_held_locks);
 
 static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts)
 {

From patchwork Thu Feb  6 10:54:25 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962823
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C386223099D;
	Thu,  6 Feb 2025 10:55:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839304; cv=none;
 b=fYRcsZFPioB9fzlT31WzHx+A78oNeg15E5JOOCG1PW5Nzy1D8zKyyKGce9Nh15n37O+8Zfp+vW0Zj+ZOoolwUcdBWMSUjzJDUM7ianiO1+pdCNlz6RNF+kBiSgESbPINzVafNOLYQwPSOOdZUc/PYu/Qd4IQ0zO7eTOnnQ4CNy4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839304; c=relaxed/simple;
	bh=qYphScagVacMZI8xAzQRuNOSyn9/aS/2qYdw+lGxq8I=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=HNwOCAOzRGT1NqcFPZvC7drXZOXozIuzaS0kN+Cu8CuGP4ktinorw7CtmC4PA4VtbjYmwwdZmpPFQfnmI0WVXsEdnqPaiVd0RNIlZM0k/p+Sl1Dvk5cC8dVffb+7hDIHjVnL/ZTim/MGDqwFDisE1vXY3eK2oB3l0lhvVKI1G3Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=IJHyOU5e; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="IJHyOU5e"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4361e89b6daso4508905e9.3;
        Thu, 06 Feb 2025 02:55:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839301; x=1739444101;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=qz+NS5T4GU+VbAH0BNt16FcBYOnl5PXsvLXnbwAi+Qc=;
        b=IJHyOU5exJy6tBg+2Iq1wln2wZiqd1j3p1J/XnqH8azqafHIpjOi8mGKmq6kZgQb17
         swxHz+UsNucM3EZvyqR5b1O6t/Ew63d6N3Ai9+v+neUXt2MBDsdap3GAnWbOdQdMNcOC
         vSLDFiyzRt+jKKs2LXyrFvh5STUe39jOYLqoLyzjwCQ0y9w7cSsSdGp7pgUl18eRT3hS
         M+08d2pPGuZ2APCyE8C0HHDfii0QTEc+FkJ5vNhZScNy/r9/mQimaeyX1TQCZ5bKPVts
         QzCWB3Nt4QCkn+c1/zFtA3FcFgsnhOukr0okLYnpXJEMCb2jfQANfanQzplpdE5aObJD
         Gc/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839301; x=1739444101;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=qz+NS5T4GU+VbAH0BNt16FcBYOnl5PXsvLXnbwAi+Qc=;
        b=uyG6HCG/tgZLrgVGAOBDKfvLBOB/WdVcCPwYhsytb0+q+J2y0hXPb6dXNZi9A+Lied
         Jt/1I54fB0jTgieL5PZW7ErUp7QaVLctqzIW1ap4qiLud3dXqJlDswwNt8k+jEWGn8jV
         XiDcPUEoZz/025n7qOtnkyo0yWQZRQK9NkuEsoQothahQ3WPFZccXedgEtW+jJvgJs0d
         1oUVk1yGYfVn6z70RGscQQJG3s3SPH5LoZnuU+Or8vzMEXVqa2i9uAHlpcVWdFxbVSca
         0/mqYAzFHihChPTRGxpV0fYXZYLo3Bm0F1zy5IsCqY4VOw7Q2ZwEyjZgAhSZdNv8+uJp
         Ly+g==
X-Forwarded-Encrypted: i=1;
 AJvYcCVhtiI9jap7WMnlMVLzOtDWGgTr3aRDBnILcyw/GP4XygVtT7r/Yx9Yaxr9m8hLfpfJfyM7rH76h2I2GX8=@vger.kernel.org
X-Gm-Message-State: AOJu0YxIBjLSpI3snGz0a9WDsEXjCRQ8ukn2tQkBjbjVc3vPgP7ku9rX
	LIhC57RD+5asvwj4YOdpmaGqPfkI5vissPl6h4OSxc7FBJn9x208/37uCtnbLJM=
X-Gm-Gg: ASbGncshDurRLAC5rcB8uzK5CZfepiE5HpCTJ98PBSnMk6yl3bM0ZNeiLhNETmUBYTT
	fpPGuWzJiGG6OQNv6uTyLNXYRDOvMckVEa0Ge4V4ocVBrPdDhcWZaj6T9LGWoDXjh2/G3NWZVr9
	DaaFE50HTSgQQwLFXa6HKvZHIUjObuO/Ljc/623y09MUBxPH6qeYwFLRrUc1iQviVTNPV77/BLE
	hXbV3O0MuEQrGmTmWV4YHBndPaFgTljovJTav2YDoMEZm6ayo5rBH0eHF/g9w1S52ApFSiQUyMH
	fbifww==
X-Google-Smtp-Source: 
 AGHT+IG3YUH0oYkY2WcU9nkb8nClyB5AjNsFI6WIkNWS0fGK3jGoBgR9RRDJA0dBgbN9twa+WCc61A==
X-Received: by 2002:a05:600c:1c87:b0:434:f7e3:bfbd with SMTP id
 5b1f17b1804b1-4390d5611fcmr49163655e9.23.1738839300786;
        Thu, 06 Feb 2025 02:55:00 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:25::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d94d7c7sm50627245e9.14.2025.02.06.02.55.00
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:00 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 17/26] rqspinlock: Hardcode cond_acquire loops to
 asm-generic implementation
Date: Thu,  6 Feb 2025 02:54:25 -0800
Message-ID: <20250206105435.2159977-18-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3789; h=from:subject;
 bh=qYphScagVacMZI8xAzQRuNOSyn9/aS/2qYdw+lGxq8I=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmEQyiG868iJLdTJeV5rNF45XJkM3pzaj93Qpc
 DMdltmaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyjhAD/
 94L7gwFOANgho2OHK/jFtxh/dQWudROmLlLNyaJh+qGxZ9/Y6D95iWxtze9d8A8JWUeVYuNcNzGuEN
 q2gb1bS2T8Pk+VCqcFRqsioDWHNnqM+BwLGq26+gTj0p3zop+cgAPklesV00m68IlSy1LhMcIzIF2E
 slsAxGCoT6w+U1KGge2lPNChfW8Kf4kMiHr5s4V3lqgjh5jDkHofe2E0f88YBI6NEK9X3fm/gk1V+J
 VIaptHRDqsUJmlPLmvdU51DFEwmGDVtxxc8mvgA6F08d9d87r+yOm0Oos298QtBiQcb6IqHR1aka9N
 ITML8aqrOeF2zgN99ZZVlzpqrJWuvSp/jTdyiXKWVRaHjVEtHs0pTh7ybdJy+0D0bEzNBE9I5MQHe+
 WHend498nQc8gE/neaaMboKfxVnAocxfFtQoQknslS76Z48WT/MN4FdUGVjS9jgF8kAQB/iTqc+2A2
 jb0b9ivFQZN1YJgTlFTR4zPHbYRw/NOF7HTisjbVAjun3ow+FP3VgcwZPkruEl+31v44pUzsmk7WZm
 PBqeRGaWYJACjXqHGBEueZu13cMlJCXDoarHrZqbMc5frds4B2dEW+0qLeFUe/LZJqdKgR+36oA8nE
 hBOileJ273cIjtBiGXGnEU1vWDHPgN8zSEOKxP4Z77q0bv5ucidRdFRkNFZA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Currently, for rqspinlock usage, the implementation of
smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are
susceptible to stalls on arm64, because they do not guarantee that the
conditional expression will be repeatedly invoked if the address being
loaded from is not written to by other CPUs. When support for
event-streams is absent (which unblocks stuck WFE-based loops every
~100us), we may end up being stuck forever.

This causes a problem for us, as we need to repeatedly invoke the
RES_CHECK_TIMEOUT in the spin loop to break out when the timeout
expires.

Hardcode the implementation to the asm-generic version in rqspinlock.c
until support for smp_cond_load_acquire_timewait [0] lands upstream.

  [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com

Cc: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 41 ++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 49b4f3c75a3e..b4cceeecf29c 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -325,6 +325,41 @@ int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout)
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
+/*
+ * Hardcode smp_cond_load_acquire and atomic_cond_read_acquire implementations
+ * to the asm-generic implementation. In rqspinlock code, our conditional
+ * expression involves checking the value _and_ additionally a timeout. However,
+ * on arm64, the WFE-based implementation may never spin again if no stores
+ * occur to the locked byte in the lock word. As such, we may be stuck forever
+ * if event-stream based unblocking is not available on the platform for WFE
+ * spin loops (arch_timer_evtstrm_available).
+ *
+ * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this
+ * workaround.
+ *
+ * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
+ */
+#define res_smp_cond_load_relaxed(ptr, cond_expr) ({		\
+	typeof(ptr) __PTR = (ptr);				\
+	__unqual_scalar_typeof(*ptr) VAL;			\
+	for (;;) {						\
+		VAL = READ_ONCE(*__PTR);			\
+		if (cond_expr)					\
+			break;					\
+		cpu_relax();					\
+	}							\
+	(typeof(*ptr))VAL;					\
+})
+
+#define res_smp_cond_load_acquire(ptr, cond_expr) ({		\
+	__unqual_scalar_typeof(*ptr) _val;			\
+	_val = res_smp_cond_load_relaxed(ptr, cond_expr);	\
+	smp_acquire__after_ctrl_dep();				\
+	(typeof(*ptr))_val;					\
+})
+
+#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c))
+
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -419,7 +454,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts);
-		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
+		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -568,8 +603,8 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 	 * does not imply a full barrier.
 	 */
 	RES_RESET_TIMEOUT(ts);
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-				       RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
+	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+					   RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {

From patchwork Thu Feb  6 10:54:26 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962824
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EB1A22F3BF;
	Thu,  6 Feb 2025 10:55:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839306; cv=none;
 b=U9JnDoQ7dL/F+og+a5z2kBpp3qwfyZvDHTvFKMpTC7Wkyin40LGAgGEh8aGdo6iUf+KnEdk/uRTV8/SZvaDgzMchZuyIDTDQ54yrCEPQCUvN+dtcI75J1lP8z79L62Ms3Hz7XNo6acdqL/G6fH1FYIZ+X5xqULexx6myl7GARrY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839306; c=relaxed/simple;
	bh=E+oadE0wJWqBDHpLclZmcj/fTawgs/RCGY18+MKVeJ0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=A0fWWk7692ZXe/hbv+6IH0qY0JNtODIdpiDe5INlTOA+FpI11aRnEd2Fra+5VJLub1Ww6ZDj8z4fL7UKNdOMl/5/5jJ7+LWLNNN6+5pRcpql3ayXdOgJUX2bxeKP0bLQcvYeNHETeo+8lwtpefBFAg/cVs2f1n1tgT4eF9DYELw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=U/NCpswX; arc=none smtp.client-ip=209.85.221.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="U/NCpswX"
Received: by mail-wr1-f65.google.com with SMTP id
 ffacd0b85a97d-38dba1cc632so429644f8f.0;
        Thu, 06 Feb 2025 02:55:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839302; x=1739444102;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=JSwd0Ic/zADHNPGibPzB3vfQZwdDu0cIcYcU2UTlBE0=;
        b=U/NCpswXPsupXzvY8PX2azXWLYEfA5Hp4IznwBrq1WDvH+qXR/KSpL3rH7hnVqnnhv
         t2wEB1H8vwcw44LZgJ+/hNKh8frM591lyWGoZvi95PwW8Zs2BBZCpVW3tDKPWIWVK/xc
         JryOAG2M/yv6Y8M8o3XbOrka6mOfaURw1ebBmqNWM0lSbZUEzE+XVr1J60wwklbuH4Pn
         LpziQ+OsMB1sOC04uO7eNQ99qWop5csvVS9iFxmuqKj92Q6ZKfeshpqOAem8pDKcVVIp
         XqHVrtZXqDga+iWWvKYSpXT0Xwfia6HunFObSx1SgEt6XTdOOVlJjaHZVdhRXPnqd8YY
         banQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839302; x=1739444102;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=JSwd0Ic/zADHNPGibPzB3vfQZwdDu0cIcYcU2UTlBE0=;
        b=YVnWGDh5VEjC6ylHJLrAENv3390qp8F1iMTnQA18tEhGiQPTeKHEaXh8x81Jub9GrE
         8JNd2fxxgT+XRyoybIjijXQrbx5pBJUDe5FRRYXXicOwJ/9WMZZz39hf7Xs2nkwd1GuR
         AAPhEwrFeIAu/lM+NcRkUA8QFM7MRJj9FXXLsrbu4hDyhJ2TTdbvivLop9CWzL6i3CgK
         mdrNF6x+9Iu2GvLfeE8S2vwrmOx4hZ/4yYyRobw8ScUKmirp7FdWuxojMsSoj8gbCNlP
         Mv+oMgL3buH1WmUH3klY7UdJCP8ZDZSU5kaJ7QE34hwt2BfFV1Y341UaL9P6vjI/1Nbg
         T5Yw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXlC8zuuURW5G5ETj5/3xcrpvZ85iU+i5/FnFm+v9jGPqfBgwsNQ6GDwbJX4zvYsJ7oq7Q9oZeWxB3e3lg=@vger.kernel.org
X-Gm-Message-State: AOJu0Yym5VnowTW8+fmgaIrWMBgo+KSC6TliICgf4Siy4nP+mwRAplgu
	S1UZnYOfp46bOF6gPc7zhzcPnr7st4jEANr9t7ExpX1vr5exRoUHV91OWx47T8U=
X-Gm-Gg: ASbGncvKIvD75OOXphGust36koGHdGRccGtDM/Wwcoeh4yRobUEfXUXeiHjqxv4Fth3
	p3b2OZO/wuGRboL3fCgDkU80PJr84UVXzUm3FAnFZcOC5jK7ZRyq3BGxUghZcqcAWfIVXG6JipS
	NxXyMLgbR9xxHZddJB5UCErSnUfT6OZ1NpL2FfHCbcts0D53o70u7vT+x4LZni6gU4oVJZGIhFI
	KMolx1y+99oc6GC/ZIx9u8A+RmZeaK6Uo49bCEJoDCulgBJNVvctGSdlhYKhe9EgXktKos6B9up
	Hrel
X-Google-Smtp-Source: 
 AGHT+IHKdNt/2i2niit9x9Q/R1mWsETc05ExweAVXJWi6Fe7g9aArrjNh6K5B0b8g58nbAsrLJyEOg==
X-Received: by 2002:a5d:59ac:0:b0:38d:bf6e:adca with SMTP id
 ffacd0b85a97d-38dbf6eae30mr869218f8f.48.1738839302123;
        Thu, 06 Feb 2025 02:55:02 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:2::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dc31b9394sm473848f8f.11.2025.02.06.02.55.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:01 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 18/26] rqspinlock: Add entry to Makefile,
 MAINTAINERS
Date: Thu,  6 Feb 2025 02:54:26 -0800
Message-ID: <20250206105435.2159977-19-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2083; h=from:subject;
 bh=E+oadE0wJWqBDHpLclZmcj/fTawgs/RCGY18+MKVeJ0=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmodQ4/VhA2GYFRoJhLQZknuv2U5hRh8VIgaN5
 gjWRV/OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryq+SD/
 94tOt0FPzLcovxJ8PvSGMSImDYXVh0wToZsIW+lhBs3jfHW5GETrWPgvHrSUxGhML9no+RKLfirkp4
 E1cPXorlRj2ki4xsWzLSlHk+EH2PWoH8jCjs6jMrtK+SMqCA7Oj4Ice6DQmaMIN6/hB/xu6chupbLc
 1lXBQZPtFxk1eXA75g9LS8ZVQ3WXOnPj+t6OewmPFulHGjMCCRR9/1M8sw89gkNsM3aB5E5I7YGAoV
 DP5513oBCDAn8tJPZqTzddkWfZoi+q6onyMLx6WoRPWwd6fk/EKwhICv1Ikg+RI9MkK5qeIiUkhtsh
 FX/jfVeTi3znO3Ae+dZqpxdGdwDobONfyOditGEf38ImSjAUAeUfyJe4K8xYbvfWnk1WOMn8G3jHmE
 c9muj11DGHiIVfw/ynoQqr4v5vyuNYzu5FWY81NdA3Evh0nXgKo18riB8GKo1moM5Ga4Cc9rhmReue
 T30PWz9HdqaEAV4ligZ3OeXxNDjK8YNM2P3ccL8sziaBme2U8G5uVWyFh3wSuGZextoI1/7fUBWFY7
 QuxUlV8b1+apMH6nrCBDVCO4LpBI+7ZQ9qDWn1MUzT7DTYAHwa19++ndXSIA4NCiycMOqEvocmJYpe
 V+70XrThPEC4JgdWBMXps/15vB9J+B+z1zka8ksy3fY9si4XlMK2GSNw1VjA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Ensure that rqspinlock is built when qspinlock support and BPF subsystem
is enabled. Also, add the file under the BPF MAINTAINERS entry so that
all patches changing code in the file end up Cc'ing bpf@vger and the
maintainers/reviewers.

Ensure that the rqspinlock code is only built when the BPF subsystem is
compiled in. Depending on queued spinlock support, we may or may not end
up building the queued spinlock slowpath, and instead fallback to the
test-and-set implementation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 MAINTAINERS                | 3 +++
 include/asm-generic/Kbuild | 1 +
 kernel/locking/Makefile    | 1 +
 3 files changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 896a307fa065..4d81f3303c79 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4305,6 +4305,9 @@ F:	include/uapi/linux/filter.h
 F:	kernel/bpf/
 F:	kernel/trace/bpf_trace.c
 F:	lib/buildid.c
+F:	arch/*/include/asm/rqspinlock.h
+F:	include/asm-generic/rqspinlock.h
+F:	kernel/locking/rqspinlock.c
 F:	lib/test_bpf.c
 F:	net/bpf/
 F:	net/core/filter.c
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..8675b7b4ad23 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -45,6 +45,7 @@ mandatory-y += pci.h
 mandatory-y += percpu.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
+mandatory-y += rqspinlock.h
 mandatory-y += runtime-const.h
 mandatory-y += rwonce.h
 mandatory-y += sections.h
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..5645e9029bc0 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+obj-$(CONFIG_BPF_SYSCALL) += rqspinlock.o
 obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o
 obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o

From patchwork Thu Feb  6 10:54:27 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962825
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4D5231A21;
	Thu,  6 Feb 2025 10:55:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839307; cv=none;
 b=EZtXKKlI23AETpa5UmO96Z8BEDZlKhYA55oS+rZjEOVGH+AMelGXwaIavO/0Zu/3guF2W6vwJD8+7H7Um5+rbs4+FVx9q0MNtatXzDjsYPW5cm0vOLj3F0DajTCrHeBZGZmqkuy5ctl3PCj0i+7YUKqjLLUbFkXFJKPJNncyFnw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839307; c=relaxed/simple;
	bh=PXV2g5jkXA6M5CNQXPC4dpfD1jgCcit6rNK1L4ARIdA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=RT6PxBLsIdE6Yg/y8nVvb4NMes9NcEob0oplxI4qUQuaBRfw3VjzgrUp/WrT3SVclNlfOnv5tnXW0OAR6LxLsY8suHfZjqRj+LOjJJS28Vi8PVXVxGm5OqOdUVL58n1zrjofuPHEqvN18D3OcQZLSiiCl0WT3oMAkvPU0efq0Ck=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Kt9zc2jb; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Kt9zc2jb"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-436202dd730so4840235e9.2;
        Thu, 06 Feb 2025 02:55:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839303; x=1739444103;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=pV8Sn3/Wzf2u35L7Uf0fdtRf3raefbZ3DCYUjKqcPwY=;
        b=Kt9zc2jb1gjx/QbQgZgfRSDsl9Vp/PWVXZ+Lw/Dqidrdf4iGnl3sYtnzlSA0TLiRuL
         t1/Xa2CNeUzNiMGhs+yphpu65S/0QYZKonJudr0pDN7yl0u4OSOe14tfJDsIXggrSRih
         PbzB2/B4oH/8LhbrIJe1lcyvI1OKKR9t8cZ5BM7OYwtRlQKt36RP/PnTmjvWrNqz4N7o
         Cme5RmSmG82mxKwmkMTQ6HQkmwcYMcRD65liMX1oG/f2UTr1YdkI45N30EcN2YlFvYuz
         Zs+B6jYDNoQQ+qqlWHgWiKCQk0NI/3ialqgBRobwnDhtjF6LzmSt27u3j2u/OwHxrSXv
         rZjQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839303; x=1739444103;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=pV8Sn3/Wzf2u35L7Uf0fdtRf3raefbZ3DCYUjKqcPwY=;
        b=a0qiSyZqApKloUDdS8DTNOWX7HfkQl3qAl6lexytF2Sq9SyXOR9Lb50VguyCYPCTS/
         GDLoSqWkvnlqYwoFbuqYe7GQZWcmLvZJJMWRIVeAWu6vSveX64v2IIJRr0svwxC2kNRX
         gBRhMyUlKcCcuad96MO2BZw1gpI6oxgkSTvh4mxXxeXH28pgHoJZlHu7EQ6+ccIBy3v9
         08iFYZ/ovHH/JghzZhaqIajVRgFGYebuH9KmYxvaZh8lpxpfJwwIf2XKHkpzZviwiqG6
         6lOIv+VGoRBigh95h7rYkQAMWGC0KSMrJ2wajwa8txfGSt/T+gCehcqNexeSGiTI+mZQ
         jBXA==
X-Forwarded-Encrypted: i=1;
 AJvYcCU9QNE8eHKP7SI2ccG+aLXtWXdIjBQfIwXeiNCunOSBbLo7Mpdpe1OuxhjPpfNCE3UsTPzKEBhV4c3VQ18=@vger.kernel.org
X-Gm-Message-State: AOJu0YyfTm+HzkY3KzHwDhazBAw359wWdbZjlMmGYHHHf0QmPVl7jyPj
	68AIXrX6/rxfw9hIgVB/MUKASwrP0xxCppU/5mXPoCNUPUYKq6yfqaV6hICuLaU=
X-Gm-Gg: ASbGnctyCtNi2yzBhRnNstCWUmVM80MTrKQ9gmrRDE0iSkcQXJ+aTQxtCVdXESOob/5
	+v8rrzZlwxxlQj5+5/p3JTd5iaMJHey64BwZru2mdQEnGvpJxk+NWCZm5eRbf7WYqOqUNtApVkv
	LpQ6uDK3GS3K97Fs0FpRKgqhvili4rMgs7OTTGXj08fTKAKVGshi/SAs1eAKfGwr0fNALai3701
	3Inuc+2y3XUAepFTDOlOkhDZ1AJr+uJcLm9OhhLnWoooQG45JlltJ5SVYY2WUA2dQCskqaaOVZj
	Rb+Qmw==
X-Google-Smtp-Source: 
 AGHT+IF1B76+1ynE7+9dQEBX3L/iUxR6t+WoOTyxAKlHsKCxtnZ617dC5+DYUZ8gZKTRoQgB1NYv6Q==
X-Received: by 2002:a05:600c:310b:b0:434:eb86:aeca with SMTP id
 5b1f17b1804b1-4390d43401bmr53989535e9.10.1738839303346;
        Thu, 06 Feb 2025 02:55:03 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:73::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4391da965a6sm15547985e9.6.2025.02.06.02.55.02
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:02 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 19/26] bpf: Convert hashtab.c to rqspinlock
Date: Thu,  6 Feb 2025 02:54:27 -0800
Message-ID: <20250206105435.2159977-20-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=10973; h=from:subject;
 bh=PXV2g5jkXA6M5CNQXPC4dpfD1jgCcit6rNK1L4ARIdA=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmPzcSKNTT0EqqnnkfrvBHcqC+gjliccdRPRLE
 gq5UTVWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryk5KD/
 9ctjNzhAOgMhJSDZVpv3j4of82O3Xfa/GeQWEsteAu7YzSYq9AIIE3HFCAgpILNT3t5B88CX2Ed1lP
 x7dqTeC9Uk0p5aenpsjKIaVczLP6xpR/qmVo4I6h57OW1szo7kfpJPwNod+b01Lsjd/4XhfiP0XMTP
 G76JN02zA8AGTVzvJ8vGj1QdFNU9+tF03KJXh+Ai+Hy7bM/zZZnTDzBA7JGwE5Z3ySRB16mRXmSVLz
 CLGdo2Vx6qHCLVgZYsnC2f8JFgFXNNkkEcIDSP7o4OaZzrAd8KLs4rj30sgypf6XlKOaZP0cMAh97R
 rv9OgrXJGSWzBydhJ2Yo/U9qlaChNMWunq075WSaiww/hRbinOWydGQIl0BjHkhOsKVhBVaUhJT7mH
 11W6qKAuXa0BEPoRunMVTqW01R4v/bj+YfT4PH+kIRjtYpePc7WvEnxYYr3HAhgiEiOzRNYV/nz7i1
 6Qxz8MOJnYfqv5lWb482r1sfm0F+njVMg5corGbZpFCvMJMAbVl7JlJWn3KBt/rsn852KjsX0vk+KT
 Y4I+fnWdHm6XJ+zLmMdJjxipaCf42YtN3jaFQn/AfCmYRCm/oAaqCIpx1s2cTQi5Kpy0J3M4iGCVSm
 IfffEEuYwarej+1RmRfEc7KsRdw9g1C3XN/mK0+uQ97QaolvQ1fpjm1VZIXg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 102 ++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 70 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4a9eeb7aef85..9b394e147967 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,6 +16,7 @@
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
 #include <linux/bpf_mem_alloc.h>
+#include <asm/rqspinlock.h>
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
@@ -78,7 +79,7 @@
  */
 struct bucket {
 	struct hlist_nulls_head head;
-	raw_spinlock_t raw_lock;
+	rqspinlock_t raw_lock;
 };
 
 #define HASHTAB_MAP_LOCK_COUNT 8
@@ -104,8 +105,6 @@ struct bpf_htab {
 	u32 n_buckets;	/* number of hash buckets */
 	u32 elem_size;	/* size of each element in bytes */
 	u32 hashrnd;
-	struct lock_class_key lockdep_key;
-	int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab)
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
-		raw_spin_lock_init(&htab->buckets[i].raw_lock);
-		lockdep_set_class(&htab->buckets[i].raw_lock,
-					  &htab->lockdep_key);
+		raw_res_spin_lock_init(&htab->buckets[i].raw_lock);
 		cond_resched();
 	}
 }
 
-static inline int htab_lock_bucket(const struct bpf_htab *htab,
-				   struct bucket *b, u32 hash,
-				   unsigned long *pflags)
+static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags)
 {
 	unsigned long flags;
+	int ret;
 
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-
-	preempt_disable();
-	local_irq_save(flags);
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
-		local_irq_restore(flags);
-		preempt_enable();
-		return -EBUSY;
-	}
-
-	raw_spin_lock(&b->raw_lock);
+	ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags);
+	if (ret)
+		return ret;
 	*pflags = flags;
-
 	return 0;
 }
 
-static inline void htab_unlock_bucket(const struct bpf_htab *htab,
-				      struct bucket *b, u32 hash,
-				      unsigned long flags)
+static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags)
 {
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-	raw_spin_unlock(&b->raw_lock);
-	__this_cpu_dec(*(htab->map_locked[hash]));
-	local_irq_restore(flags);
-	preempt_enable();
+	raw_res_spin_unlock_irqrestore(&b->raw_lock, flags);
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	struct bpf_htab *htab;
-	int err, i;
+	int err;
 
 	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
 	if (percpu_lru) {
@@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab->buckets)
 		goto free_elem_count;
 
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
-		htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
-							   sizeof(int),
-							   sizeof(int),
-							   GFP_USER);
-		if (!htab->map_locked[i])
-			goto free_map_locked;
-	}
-
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
 	else
@@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
 	bpf_mem_alloc_destroy(&htab->ma);
 free_elem_count:
 	bpf_map_free_elem_count(&htab->map);
 free_htab:
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
@@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	b = __select_bucket(htab, tgt_l->hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return false;
 
@@ -828,7 +794,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 			break;
 		}
 
-	htab_unlock_bucket(htab, b, tgt_l->hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l == tgt_l)
 		check_and_free_fields(htab, l);
@@ -1147,7 +1113,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		 */
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1198,7 +1164,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 			check_and_free_fields(htab, l_old);
 		}
 	}
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l_old) {
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
@@ -1207,7 +1173,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 	return 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1254,7 +1220,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	copy_map_value(&htab->map,
 		       l_new->key + round_up(map->key_size, 8), value);
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1275,7 +1241,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	ret = 0;
 
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 err_lock_bucket:
 	if (ret)
@@ -1312,7 +1278,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1337,7 +1303,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1378,7 +1344,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 			return -ENOMEM;
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1402,7 +1368,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 err_lock_bucket:
 	if (l_new) {
 		bpf_map_dec_elem_count(&htab->map);
@@ -1444,7 +1410,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1454,7 +1420,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l)
 		free_htab_elem(htab, l);
@@ -1480,7 +1446,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1491,7 +1457,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l)
 		htab_lru_push_free(htab, l);
 	return ret;
@@ -1558,7 +1524,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map)
 static void htab_map_free(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	int i;
 
 	/* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback.
 	 * bpf_free_used_maps() is called after bpf prog is no longer executing.
@@ -1583,9 +1548,6 @@ static void htab_map_free(struct bpf_map *map)
 	bpf_mem_alloc_destroy(&htab->ma);
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 }
 
@@ -1628,7 +1590,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &bflags);
+	ret = htab_lock_bucket(b, &bflags);
 	if (ret)
 		return ret;
 
@@ -1665,7 +1627,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	hlist_nulls_del_rcu(&l->hash_node);
 
 out_unlock:
-	htab_unlock_bucket(htab, b, hash, bflags);
+	htab_unlock_bucket(b, bflags);
 
 	if (l) {
 		if (is_lru_map)
@@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	head = &b->head;
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
-		ret = htab_lock_bucket(htab, b, batch, &flags);
+		ret = htab_lock_bucket(b, &flags);
 		if (ret) {
 			rcu_read_unlock();
 			bpf_enable_instrumentation();
@@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		goto after_loop;
@@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		kvfree(keys);
@@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		dst_val += value_size;
 	}
 
-	htab_unlock_bucket(htab, b, batch, flags);
+	htab_unlock_bucket(b, flags);
 	locked = false;
 
 	while (node_to_free) {

From patchwork Thu Feb  6 10:54:28 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962826
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBDA7231A3C;
	Thu,  6 Feb 2025 10:55:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839308; cv=none;
 b=OyKZEefAvECh27EI7NCjq+ftxqO+JP/Ab/yeLUwShflucokCfGPXps7WoRY429Mdax58pptouV32uQ8yYXpZNHVfil3ISZeWTz3ydIuZA1jplzgcE6cfv81gsQCUGSLdJyD5XAWOj11nPrCmcPzqd/YGucv4UFlDFNFCRCCuYxA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839308; c=relaxed/simple;
	bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=BylMcSDHmlxxoR7YTJfoFeYliayELxdo02kyNngfDHpwyNnwYGoMNFf+IdybzWi55pB/FYlnwUB5ub/Bcz41PZrj5+iYYKLCqyD1pPs7TKfka13dvn+cqOI1QzWH8Mr7jmYMCzSHA1Yh+6HVc9i4O3u033bgWjitHyofrrSqv5s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=XvztQax2; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="XvztQax2"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-38db34a5c5fso325938f8f.2;
        Thu, 06 Feb 2025 02:55:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839305; x=1739444105;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=;
        b=XvztQax2a0JYqPd6jFXQ6ErTuHnvFDB2Wo328WE/a0oeEePjDKW4az35LMENBbvvpX
         co0O+ikp6X3I/IizCeEsjZCUWo8FcgDkiyRrG6KzYbGHYT5Y0/hWVhnJ7VRy1KiKDspq
         hywlpm6CDz02UKIyB9+hOyB53I2v8PYfEQVBXS0wYnpvJOiWmRz3O9i6t3vuBtf9RwZI
         f2tC5LBKIF17r30rkjT34w3ZjVdpqLEMAXQDkDBA9Nujr5I7JUYSRcdBAd+CTL4Cbdtb
         bz8B1MNZz3GoPVgzLYXVykRvC2Hd9/HywI3f3fN1ql6aIPlWeBawQCbhoBcF+vGgfPw0
         oHjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839305; x=1739444105;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=;
        b=LW/gMGAkypKZC59clkGS07iGMMgRWtPdFvtjjdiQutCK6jJ65O1WpYwFvDzehDdzqg
         AlLJKSIPunkqZL88fFW8OdAAq+TSzzyksUxXykUfIRjFfRmAfdmrEaU056lp3N9xu4n3
         JOeO8Ku845uCa/PBbWS5UX9B3uNQpNY+slikl4XPQe3m5dvRXGJQTrTUwcUX+k4i8u+T
         DL9b5l4mSXvSkCIu8t4OwmdFY0ozQBF41NNEU1t8xUtaQME/QKGrOURzDAHV+HkzvBvN
         PFSF0BLjwMoil5Dl3YbKVfgOgBfffJKp6wRovunJ3DD7wwoJ1Ypht5RW+mL69x5eM1FS
         IRTw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVHP8xHPhUbFtituZkqfxoSS4nNwPRUdur1yqGeO6VfxgFTpcwRp2H6uLtTkufXNhfsw17NMoh+Z7peClk=@vger.kernel.org
X-Gm-Message-State: AOJu0YzDaVGMJvLBfQP5wbQOtEbiM8gjL186xxpR0QttXkCI/tP+w9nl
	05dJ5xF+YTKN5/20TF8pETPsWosA8XX9rYtPKEVK1+Ah568h7luoZ7Z7r4xZE8M=
X-Gm-Gg: ASbGnctkD1ANRixv7LF1YQEosOibA4MNymnODlyscR84jLkervB4lM42a8FNU84462D
	9bkiLOhUOljZzPbi4ERxkiM3G3o6eDnwlrm0ll5FbTFPiRiDAym0g7adKrIa5Kb1HDjARm8mgav
	+3EQAacLMnLDgUINxEDk9z3uO+O+4alNc8Tolg1zbgkTa8tbUtNeJWi3AgmCClF76oHUFy6AFNS
	4MrkcDMugjeoQfJawc64H7IGh9PdzVH/Bu8KqALEU0iDzTKI1D+qQts4fNeoCeY/NhMCThuuxQE
	d8OY
X-Google-Smtp-Source: 
 AGHT+IE1fh4Z/RX0JKmKGEVYdBOLzgVhEKs2AnV1hSi1PhRaogcIsY0ipckf2p7b6ebAKkLpzIqwFg==
X-Received: by 2002:a5d:6da3:0:b0:38c:5b52:3a5e with SMTP id
 ffacd0b85a97d-38db48577fdmr4311094f8f.8.1738839304658;
        Thu, 06 Feb 2025 02:55:04 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:2::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde0fc25sm1415577f8f.64.2025.02.06.02.55.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:04 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 20/26] bpf: Convert percpu_freelist.c to
 rqspinlock
Date: Thu,  6 Feb 2025 02:54:28 -0800
Message-ID: <20250206105435.2159977-21-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6512; h=from:subject;
 bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRm4A+EBJCWxvZBf7f2Rwf9fNAzPFNCmT/crlsa
 NhTJVJuJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RysBTD/
 9cLKt+PR4fSl5mi6p7a7E2k65BK0pYhLW8IoR7EvD1rs1MUM8TYi9Wd0P3u5ARJSKgUZRovio/UJsb
 MCLHj+03gAh32u7M8XtrbyRGGjWp81sskv3umm0S5W6qW7GNEzdDfhDCVgZGxTaPwghKEcP6GNkC5D
 3LWP4b9pp2XSrw5PDT7EN54Ds1FfjGWg6awZXbJcWgVmmS4522IVKIgAgnotntrcI70ccUJoxtdyUD
 ADWxNVhu3snrVyFYlCSn80qYS6o0ZBYVjqh5K1pU/GnUahNHcT7iZbSHN3HH7/pZhLFbvphUDLtImg
 bi4WsQVeTVKTqXtm6o/FeA/7+P+pIhRSeOynMeOZT9EqTagpNVaaptRj5HZMuXJWY5qUIHnY+c8P0v
 o1VHWya9TBJyOJM37cFthbRx9BxyN7uDd3fsaqDwGs+p+NFzUAh5Yyx3n6dkBHQI+zFF2mryGLzXTR
 h/vu/u/DQSfC2693zDq/2G2Nq8GCT4nwzP2057XK4Fhewd7/9rFPbG2Azm6f192ZUVX2yNhrwmZcFt
 jjpQ4CaSn5LXmu4ocoOo3rDF0aGwvP5Sb/Jn9vS78QfBYpWYEJxBrVtRMyeKfPjo4jEinqb40wB4vk
 h5S6Uj1SdTQN/i3IBzw259gce+jCSWN1SQaz6Dgy7OdNHF2kxY9Ea5mBCF8A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert the percpu_freelist.c code to use rqspinlock, and remove the
extralist fallback and trylock-based acquisitions to avoid deadlocks.

Key thing to note is the retained while (true) loop to search through
other CPUs when failing to push a node due to locking errors. This
retains the behavior of the old code, where it would keep trying until
it would be able to successfully push the node back into the freelist of
a CPU.

Technically, we should start iteration for this loop from
raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus,
we skip execution in the loop body instead.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/percpu_freelist.c | 113 ++++++++---------------------------
 kernel/bpf/percpu_freelist.h |   4 +-
 2 files changed, 27 insertions(+), 90 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 034cf87b54e9..632762b57299 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s)
 	for_each_possible_cpu(cpu) {
 		struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu);
 
-		raw_spin_lock_init(&head->lock);
+		raw_res_spin_lock_init(&head->lock);
 		head->first = NULL;
 	}
-	raw_spin_lock_init(&s->extralist.lock);
-	s->extralist.first = NULL;
 	return 0;
 }
 
@@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
 	WRITE_ONCE(head->first, node);
 }
 
-static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head,
 					 struct pcpu_freelist_node *node)
 {
-	raw_spin_lock(&head->lock);
-	pcpu_freelist_push_node(head, node);
-	raw_spin_unlock(&head->lock);
-}
-
-static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s,
-						struct pcpu_freelist_node *node)
-{
-	if (!raw_spin_trylock(&s->extralist.lock))
+	if (raw_res_spin_lock(&head->lock))
 		return false;
-
-	pcpu_freelist_push_node(&s->extralist, node);
-	raw_spin_unlock(&s->extralist.lock);
+	pcpu_freelist_push_node(head, node);
+	raw_res_spin_unlock(&head->lock);
 	return true;
 }
 
-static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
-					     struct pcpu_freelist_node *node)
+void __pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
 {
-	int cpu, orig_cpu;
+	struct pcpu_freelist_head *head;
+	int cpu;
 
-	orig_cpu = raw_smp_processor_id();
-	while (1) {
-		for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
-			struct pcpu_freelist_head *head;
+	if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node))
+		return;
 
+	while (true) {
+		for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
+			if (cpu == raw_smp_processor_id())
+				continue;
 			head = per_cpu_ptr(s->freelist, cpu);
-			if (raw_spin_trylock(&head->lock)) {
-				pcpu_freelist_push_node(head, node);
-				raw_spin_unlock(&head->lock);
-				return;
-			}
-		}
-
-		/* cannot lock any per cpu lock, try extralist */
-		if (pcpu_freelist_try_push_extra(s, node))
+			if (raw_res_spin_lock(&head->lock))
+				continue;
+			pcpu_freelist_push_node(head, node);
+			raw_res_spin_unlock(&head->lock);
 			return;
+		}
 	}
 }
 
-void __pcpu_freelist_push(struct pcpu_freelist *s,
-			struct pcpu_freelist_node *node)
-{
-	if (in_nmi())
-		___pcpu_freelist_push_nmi(s, node);
-	else
-		___pcpu_freelist_push(this_cpu_ptr(s->freelist), node);
-}
-
 void pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
@@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 
 static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
 {
+	struct pcpu_freelist_node *node = NULL;
 	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
 	int cpu;
 
 	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
 		head = per_cpu_ptr(s->freelist, cpu);
 		if (!READ_ONCE(head->first))
 			continue;
-		raw_spin_lock(&head->lock);
+		if (raw_res_spin_lock(&head->lock))
+			continue;
 		node = head->first;
 		if (node) {
 			WRITE_ONCE(head->first, node->next);
-			raw_spin_unlock(&head->lock);
+			raw_res_spin_unlock(&head->lock);
 			return node;
 		}
-		raw_spin_unlock(&head->lock);
+		raw_res_spin_unlock(&head->lock);
 	}
-
-	/* per cpu lists are all empty, try extralist */
-	if (!READ_ONCE(s->extralist.first))
-		return NULL;
-	raw_spin_lock(&s->extralist.lock);
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
-	return node;
-}
-
-static struct pcpu_freelist_node *
-___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
-{
-	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
-	int cpu;
-
-	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
-		head = per_cpu_ptr(s->freelist, cpu);
-		if (!READ_ONCE(head->first))
-			continue;
-		if (raw_spin_trylock(&head->lock)) {
-			node = head->first;
-			if (node) {
-				WRITE_ONCE(head->first, node->next);
-				raw_spin_unlock(&head->lock);
-				return node;
-			}
-			raw_spin_unlock(&head->lock);
-		}
-	}
-
-	/* cannot pop from per cpu lists, try extralist */
-	if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock))
-		return NULL;
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
 	return node;
 }
 
 struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
-	if (in_nmi())
-		return ___pcpu_freelist_pop_nmi(s);
 	return ___pcpu_freelist_pop(s);
 }
 
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3c76553cfe57..914798b74967 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -5,15 +5,15 @@
 #define __PERCPU_FREELIST_H__
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <asm/rqspinlock.h>
 
 struct pcpu_freelist_head {
 	struct pcpu_freelist_node *first;
-	raw_spinlock_t lock;
+	rqspinlock_t lock;
 };
 
 struct pcpu_freelist {
 	struct pcpu_freelist_head __percpu *freelist;
-	struct pcpu_freelist_head extralist;
 };
 
 struct pcpu_freelist_node {

From patchwork Thu Feb  6 10:54:29 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962827
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02D59231A54;
	Thu,  6 Feb 2025 10:55:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839309; cv=none;
 b=BJ2YSCkjTYimbZM82Kw6kQx32KwAwggMFTqbRIpyoLe2QsemGZtLdaF6oDl5UlIYIIimHbKNhFbAKZRjUFpp8I/4mvc4dvQ/C9bv2QDI+U4BEAspKRhSPJLP8sMDQyxZnPaeJainzxcPVTOCcIlXB3POxQseF7EPg8SArpUckVs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839309; c=relaxed/simple;
	bh=7psEOc+yd4qPNsI4KRxI79thdXuYqSsZXOHrdwDSR0g=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=BWYlkXFN8nLAJdeWGMoybQX38Sz+AM1r5H0pzHPgyph6xZncZGtuoXKJcviwMZ62SxbpBlDBUd2TkvhbID9DNZoKU8isLrArHcHbzYtXZ4F5sriBB8RiY9hqyOlyNFjQ9iIqoS6P1C9NAw92SHERHzukOcVhSIedwZLSkLpPNXM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=P3TygDOt; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="P3TygDOt"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-38db0c06e96so491207f8f.2;
        Thu, 06 Feb 2025 02:55:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839306; x=1739444106;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=eNqgh98W4funKculi1nvK5lPwgfxfi44KHpp2hUPVk4=;
        b=P3TygDOtWfUL8j5tsxvtl7s58UFN6Pjol6oGkHwCYFdx0S357gmVQCsm0P2T4pyxrD
         5R5wa6PueMNQ/gTZWqwp1P/gXF4Fy9yqzDcshWLhLJyjB5Qj2D2tiJQGu2TOURwOcmIP
         FaKEtIIq7NYVl5MLlLnnzRskR9Ov/0qjEWHmJ8HTjUW6yArP5s05hNlwqqC/kMCzVB+Z
         o3raM75FjZeCt8E9YdxM/oB3OjHrqM0sKLDtsn2ErhzrxyV8DLRK8PhESP6rM5Tkz3Dy
         0RgwtMm/O8lOomwGTw9aP2TSLVRspPZmsB+gde95ZfumbxVQs+ZeyYZ6sNP+Rgtf//m2
         YaOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839306; x=1739444106;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=eNqgh98W4funKculi1nvK5lPwgfxfi44KHpp2hUPVk4=;
        b=IZ0MSTiDfITcqBaQqSXn0srgVCxwcpGHFy7Q0ja0nzkcJ1efX4zJgGNehUVTRo9bKY
         FKujO4+5Me/yUKpB5ynfSjWBYLNMuPhwRObWhZosE1aPf2I0nOkyKWMDVb+sI2HAzDHg
         G9CFpHp3X4h6DXw6zaPTP/gPTSRiJ3WV75LfuvimdzcPEr67hFwn24A0+RO4SbCR4DU7
         Ruvn7EEjV2KeT3ZcAAsn4rfBybGb5tsFHtzQ7EzQhy3sAfPrgtLIIjOeXBcTnSNkC/7i
         5hzIdeIWg4jCl0p/fNCZdTYx5eC4Lgco6CutAAkm47TWrzZIRjpJm4g8Z69GmWcgpkTB
         SnZQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWvVuEw3qLWR9FMHLEwXDECrVHIBcsn3JaL4LI4PJifRD6KR9WvgTenJixPttXc7FGt+Kez0DwescRrsSo=@vger.kernel.org
X-Gm-Message-State: AOJu0Yw5nH0MJAkQ+lH6efWgyX3pJdztXY5Mi6ggm7X0McRLsSXS2+ou
	k9rDuebjz03sSGAROCnV9XRWP4yJChhlS541GtJ2dp9GzXFvz3FRWExeMNjjBK0=
X-Gm-Gg: ASbGncuwylITTc23oQ5zcxR/FYLu2K89kraO9U4hHXCj5TDfD0HKsz2pJaOx3JkO8Fu
	8gZfFb6Oq6x1GnRfaeeSPYXJdYXpwJ7WIaoXncenvLFVH3PJJ+5aDbE/g25UjDJaNXtDLxBnK16
	GXn2s+S5Q3/SSAqC+MgQIyWjH1oYvHXq43nnGzb5VtoiFjZPZ3Rkzdsx4cTmnY43NtQ4tbiJI6l
	VuD+EFUov5GaDT1C4jwMmsNws7xbNnDLvDvn3Nd3omjVQhzXDPlt92QtVzEaVmwqsM+IcZ6jgOY
	Zbv20w==
X-Google-Smtp-Source: 
 AGHT+IGl6SmZaMuRp7oUK50tAaXGFTAXDbbBuYWF+LV4jkLjehRBR+yIALlWuxWr7fqUlT9fVpNuwQ==
X-Received: by 2002:a05:6000:154a:b0:38d:b125:3783 with SMTP id
 ffacd0b85a97d-38db4869738mr5279252f8f.18.1738839305846;
        Thu, 06 Feb 2025 02:55:05 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1e::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d96530fsm50029725e9.19.2025.02.06.02.55.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:05 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 21/26] bpf: Convert lpm_trie.c to rqspinlock
Date: Thu,  6 Feb 2025 02:54:29 -0800
Message-ID: <20250206105435.2159977-22-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3797; h=from:subject;
 bh=7psEOc+yd4qPNsI4KRxI79thdXuYqSsZXOHrdwDSR0g=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnw6DiRxO330wCUMflYYf1S6rgXtusJs7sBci+
 PMosJk6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8Rypr6D/
 0XU5AM+X7xrBbdXvcrdW6JR+XH2p7xDA8af4vJJme6W9rY9nqzM+aeJbS2fsqkaBOjvGhsrm6KSEvU
 h5nXKuyn5/iiNQyeA2eBNyNmynnhzv6V3d0A7NJwIL2gaLY06/KM6un/G5Zg7met6nyIaESxte16Zt
 Tj5PkEGuw2YWZIcBVnQHV1DCr9Cc2YERSgY31ol47Dox5OeIfV2Y+ur+/UmtHmdvZ5+g/plLlYxItA
 3Mzw1TkFtwsYfMPR4b35Lo1IhYbORKAi5irpZ7F2x9vPeQ2b3wSFo+UjysbmMLgCp4k+cpHnPU+p16
 ETzQsMxtY4lYN51QY3p+/yat8sRLixm842KUr+YDb385pgKtUe5qoiH+F47vXvN87HX4t1ptQ2/H7R
 gu1PEJN4dsT8I3PQzCGdlQoj17l0rZMaNn3acsxv41JTaPrIZgwx7fpOoB8x903xvDrzFJXqGVrwsG
 ibywFOJqfclF2qvVUO3ygHfXLFXMLus0EHad+VZzDPc0+wy53OQPcURCkt5UiaTENbTuGscfNR9GHT
 PtQksRTWgfDQgzrDx4EEkFvZ8uKGPHwtH3SAxFXLdTMysqVjm14CT2if0p9yCYswzH/yX3dA7ImI7q
 xLGCvaXy82YcKX7auIkK8zCMUWSP5gJB30nAtkG1jSBLXHv9M0FKHlbziHLA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert all LPM trie usage of raw_spinlock to rqspinlock.

Note that rcu_dereference_protected in trie_delete_elem is switched over
to plain rcu_dereference, the RCU read lock should be held from BPF
program side or eBPF syscall path, and the trie->lock is just acquired
before the dereference. It is not clear the reason the protected variant
was used from the commit history, but the above reasoning makes sense so
switch over.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/lpm_trie.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e8a772e64324..be66d7e520e0 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -15,6 +15,7 @@
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
+#include <asm/rqspinlock.h>
 #include <linux/bpf_mem_alloc.h>
 
 /* Intermediate node */
@@ -36,7 +37,7 @@ struct lpm_trie {
 	size_t				n_entries;
 	size_t				max_prefixlen;
 	size_t				data_size;
-	raw_spinlock_t			lock;
+	rqspinlock_t			lock;
 };
 
 /* This trie implements a longest prefix match algorithm that can be used to
@@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map,
 	if (!new_node)
 		return -ENOMEM;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		goto out_free;
 
 	new_node->prefixlen = key->prefixlen;
 	RCU_INIT_POINTER(new_node->child[0], NULL);
@@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference_protected(*slot,
-					lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*slot))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map,
 	rcu_assign_pointer(*slot, im_node);
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
-
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
+out_free:
 	if (ret)
 		bpf_mem_cache_free(&trie->ma, new_node);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	if (key->prefixlen > trie->max_prefixlen)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		return ret;
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of the path we traverse.  We will need to know the node
@@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference_protected(
-		       *trim, lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*trim))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	free_node = node;
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
 	bpf_mem_cache_free_rcu(&trie->ma, free_parent);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 			  offsetof(struct bpf_lpm_trie_key_u8, data);
 	trie->max_prefixlen = trie->data_size * 8;
 
-	raw_spin_lock_init(&trie->lock);
+	raw_res_spin_lock_init(&trie->lock);
 
 	/* Allocate intermediate and leaf nodes from the same allocator */
 	leaf_size = sizeof(struct lpm_trie_node) + trie->data_size +

From patchwork Thu Feb  6 10:54:30 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962828
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B645231CA4;
	Thu,  6 Feb 2025 10:55:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839311; cv=none;
 b=YvpHI5v6klqUqvcOb4D8TKLm+9Y/5LeC0dW3EJc8GjcGgV/iCObwXogBorvTa5i3mZq7Yilw9df9uWeZJ/TkD/NnpBn3C5BT6ZzH2Ts1sYxg+HIlg/k5a11+77+nAZ3VOypqV7Bh4N7l8zGVakyFSiUgRc1x0R091jX/f3PaPcw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839311; c=relaxed/simple;
	bh=U/xq8I6EBeEZJrIOy3nlwnF6weOW6zwHeOpg6R+a+Dg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=VeD/3r6MhnLE1QpOnR4vZXRCYEItul5H22zy5FW2lvYhmmG/Kdk0Jmy3SsC3oNZqncWLbUEYPlCruSYIrGxAcOQjextzoztJpy0hYtXFr80saUvTJktVx7k0hJmsXnRCWY8OL+yqypnLHPHdv38f15jFaerjynpItgwdBR6JPU8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=iIFuvR+D; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="iIFuvR+D"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-38daf156e97so381910f8f.0;
        Thu, 06 Feb 2025 02:55:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839307; x=1739444107;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=WY5xY00eKUhPVaK5CMP9UKPxhBmvTRnKdfB0Ehrzj4c=;
        b=iIFuvR+DEP18HjqK9dm7BqCGZxLhHCCmi3eZaAvdDnjQ3Mx1vzBOyv47zjXrtIMtBK
         a116vPC1kzm73eO0Z/65rCC420wDEJ4WqDurT/pCAycacoBs75Lmr4pyWrbuN9GLhPl+
         nJdUUG7lqIsuRnk4MnkbBWkVxm73WSGdyc0Lnnn5I3OF1l7JrMO+fJeeHjx45XlRPWud
         Ztys+6itWc8RnKqhdHqeG5NuHZWqqJp3SH8H/7rJRx1BNL5F2plwhmsN/Rh7rRMa1GWu
         S70YTAxOZPnnovhnK9JxL3rJzSxvUkuSpRFJfZ1mdH6WpeeCTYJTnVdtzXJ6/T7FPeeH
         AIlw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839307; x=1739444107;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=WY5xY00eKUhPVaK5CMP9UKPxhBmvTRnKdfB0Ehrzj4c=;
        b=DvD/4+pHalowZivgcnJqeLAL+ioyqLR48V4folV59HDMN02i9WfD5Ii+zfV6V6Db1l
         V+EVqK+tx8Qi+wsyDVWYrzkyW/OA8lA1D1IT+8QODYK54kEfp5sOGTDODXMw05qX7nQA
         qgZc3/utC+WCfPsSLyq1poqNbYJKSvJ9/tUw5mu/mOELO454WFLG0D6uxtA2J6TbWmZX
         IS437l3nhX23v6xUyJ2BfWXYoVjhRlZXhAu42E1lJDARIMAIs0lwyX0dX2Y7Ry9bgKXO
         2iBOrKm1qlgbi+pTSG6M3kl+clEYcZc1Eerv/8gesAxq+cKgQpnFVUTl3IS9bLBhk865
         f4NQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWJM5A818OAv5i9Pgp5vh7ZpGSG8knkQywXoTNpHE6p4ENdaiB1Ky50DujaqZDfEtzpr5Xgn8HKQ2gcHKA=@vger.kernel.org
X-Gm-Message-State: AOJu0YxC8fEoR2FJhseytPFGJ41HQxjNMk3C6BArs/8KdWLRFdcNmX3l
	jmhhnlnobX+fpbzwJ8XEWyb3TyjyxhVNbP4uhnC+Hstw94g7npEpGcnTx9filR0=
X-Gm-Gg: ASbGnctWwVrB4fvOviLNFJVUgX4VqWn6qG6tyuOayNo951DWc5ldmSvgjWkxOJ2jyUh
	gQOuKIeFkWFO1Gqggl6OlfFZ8/YsBCOtPvsRiOC7PIRnsojSU7jA6s69SZkcuEXZOfhAWbRYsxA
	oC03Zm4ftVV6yeKqlmmcuy07S3CO/HoysYi2hiRtMn/dco9J1o+ud6w/5sorg3xD/EMDMVbx5TY
	GwWiVCQ8bZ3tCbz38P9rrvjbuCcxUj+ZOV0AHXgjzJHuqBW4YjkmE75o3/0LA3VeN5kX6EGXvYZ
	6nTJ
X-Google-Smtp-Source: 
 AGHT+IG9zXGoXyJKKHaq7JsfFOdulH8qStWXMMx7WVjIkbMNF6mgU4TITTZXCi7em7sIZ7mcLjLv5Q==
X-Received: by 2002:a05:6000:1567:b0:38a:5ce8:df51 with SMTP id
 ffacd0b85a97d-38db4857bb6mr4346951f8f.2.1738839307068;
        Thu, 06 Feb 2025 02:55:07 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde1ddaesm1381571f8f.85.2025.02.06.02.55.06
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:06 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 22/26] bpf: Introduce rqspinlock kfuncs
Date: Thu,  6 Feb 2025 02:54:30 -0800
Message-ID: <20250206105435.2159977-23-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5071; h=from:subject;
 bh=U/xq8I6EBeEZJrIOy3nlwnF6weOW6zwHeOpg6R+a+Dg=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnGE3AZgPoVzGUtERg5LFCbq+DcCX9yF59ua8N
 n7+cbDKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RylLaD/
 0VpOorKWab0lqsTn/JbqVIfhX2mJDl4AszKROjW7ZbWS/ibRaNsagLSEBcUQD0xf70owiN9Yu/4znW
 OJaUpWXS7tNbAxPV2AbNa13K/5M2I9XFuZ6Ma44gw7XUBL2+eLtpDnsloEntdH23CIdCBlFgVoMhZi
 /9B64BlcRvucuNxfRyundNxbRTbW+WL+gdtObdVpvurkEZPU7XSKLpbhrzZvQ4wxTGIf/25YvUVXE4
 S4KSqSp0B49pkPN4G9xW/jIyjgX3WLAwBlhBHZ6f15+/NQ+pg/hwN9hKNNehLoCE2vircPueBHstEE
 KegTCjgg9BggHBOklhXRKvmGFOY2CVPkx96cbQhQZG615Mp2ODEKab08GpR6au3L0Lg1QT3JQbi+tU
 DKBdLQMl1MjDsazZBZ1VwoZL4CT5etgBB13PQWNVWlid9dl1osLUMQE9UJ/QAqUtZVfz88GpkskFmi
 ol2YvLHaQVsp/3n/N56wQok42wK/y+P/xvxYD1rz0ExOWlNuLbEBjbjgFZsEQ1KAlLg2XUnKJ4yMrB
 aBVOWnLHU2q3mFDWBNz/iUcF0U3KO0efHZsiu7NYItC7HDIgvosgX8QFMNIjDG1EfiEFvUcisgaxYH
 JcTYydI991Jqy6AeM9QpCB99Wg4k4tVqvmj18ICTymjWeNb7929I4r8679rA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock,
and their irqsave/irqrestore variants, which wrap the rqspinlock APIs.
bpf_res_spin_lock returns a conditional result, depending on whether the
lock was acquired (NULL is returned when lock acquisition succeeds,
non-NULL upon failure). The memory pointed to by the returned pointer
upon failure can be dereferenced after the NULL check to obtain the
error code.

Instead of using the old bpf_spin_lock type, introduce a new type with
the same layout, and the same alignment, but a different name to avoid
type confusion.

Preemption is disabled upon successful lock acquisition, however IRQs
are not. Special kfuncs can be introduced later to allow disabling IRQs
when taking a spin lock. Resilient locks are safe against AA deadlocks,
hence not disabling IRQs currently does not allow violation of kernel
safety.

__irq_flag annotation is used to accept IRQ flags for the IRQ-variants,
with the same semantics as existing bpf_local_irq_{save, restore}.

These kfuncs will require additional verifier-side support in subsequent
commits, to allow programs to hold multiple locks at the same time.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  7 +++
 include/linux/bpf.h              |  1 +
 kernel/locking/rqspinlock.c      | 78 ++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 46119fc768b8..8249c2da09ad 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -23,6 +23,13 @@ struct rqspinlock {
 	};
 };
 
+/* Even though this is same as struct rqspinlock, we need to emit a distinct
+ * type in BTF for BPF programs.
+ */
+struct bpf_res_spin_lock {
+	u32 val;
+};
+
 struct qspinlock;
 #ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f3f50e29d639..35af09ee6a2c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <asm/rqspinlock.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b4cceeecf29c..d05333203671 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -15,6 +15,8 @@
 
 #include <linux/smp.h>
 #include <linux/bug.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -686,3 +688,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val,
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
 
 #endif /* CONFIG_QUEUED_SPINLOCKS */
+
+__bpf_kfunc_start_defs();
+
+#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
+__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock));
+	BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock));
+
+	preempt_disable();
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock)
+{
+	res_spin_unlock((rqspinlock_t *)lock);
+	preempt_enable();
+}
+
+__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	local_irq_save(flags);
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		local_irq_restore(flags);
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	*ptr = flags;
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	res_spin_unlock((rqspinlock_t *)lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(rqspinlock_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock)
+BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore)
+BTF_KFUNCS_END(rqspinlock_kfunc_ids)
+
+static const struct btf_kfunc_id_set rqspinlock_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &rqspinlock_kfunc_ids,
+};
+
+static __init int rqspinlock_register_kfuncs(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set);
+}
+late_initcall(rqspinlock_register_kfuncs);

From patchwork Thu Feb  6 10:54:31 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962829
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A58923237C;
	Thu,  6 Feb 2025 10:55:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839312; cv=none;
 b=UfAck2Yp2DlIoCfjVFAkWzhLDGgtFBiH7f7pzdURZUgUesdAwTGu6c4EmbOqDTbzj4I2T5xkAQvQpAHMhxsLN5OITdK5v3vnB7W3CAM1suTpbin3pAeZO7hfDONxluuhja4famDVtG0+C4txc1UpdXDwIKfu9UkA8Vz4hXqHmvA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839312; c=relaxed/simple;
	bh=a6nXDtyfaaaN5/sxSs0OZmYxp3ViUzHxtEF2nqEtxP4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=sC0VQz+fcFDGueEFZ26fZweKhP9GIlJA5Bu0PguduQb/aV+9qIcFJ5QTuoSp/6KM/P/kHzCWMoB/5SeEgv5YLONNYJ7bH2aJ2zb5CcK9TAw/oFA1h80ehOAd23E11siFZQpTEzDIGB/UaiaMheNve/msaf3z+svlovXJcJ8We4E=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=QCRLAbMD; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="QCRLAbMD"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-436281c8a38so4827195e9.3;
        Thu, 06 Feb 2025 02:55:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839308; x=1739444108;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=MMJzjmm9Ssn/2tshx5tAZDCI9Z1engNm3Cw+RIUnYAs=;
        b=QCRLAbMDF0WIGBXbFvFpMtx20YKXbAV4Ipo4ZuE3A+nIQJ5nIp9m9KBiBqbXLvD2iP
         NePB9IjcKfu1x0lZnlc3B1QXxd77wAhYKTgSXLcQC2Hl0I8IxYIYtRhXPteKT9SZnjyK
         XcYd9tAe6IFOn94C2H3Sd88CrVTV6bn4nTmw2aaUCY3bvRiSYQgerNze1VvxX0uoqCDE
         grcPiILwKbI3Zfsu3AOubE3bOfsElBmLRdQ79Ww2bxe6XtHMWOhzlSgWB0ZtoylAbsRh
         5plogVqwpbX+MbhF1JAYQehGbKdfTamjGmB7gUlVMxw25T0w4qN/N7z7PiReRa7spGSq
         XxRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839308; x=1739444108;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=MMJzjmm9Ssn/2tshx5tAZDCI9Z1engNm3Cw+RIUnYAs=;
        b=RAsh573amU4BOvZrP+1onlkCN0Pk/9ktrH3gjM5NZLYmPA8oD/vAxaAmmxfMygBL+A
         Rmi4VP1lcv48/kDRdpgNWjIQ0cvMLcxbqgSVpRPxwppTATvY7BL81WVrlP3uh+iUrZk1
         OIgHeTpibqii5mJFh3NU3K4H1/f9Ardutgym77tg/FfFmgBjWfUGeyt3A+zEcFVTkQsT
         dj++x4pCB6/xnOgAP6piBF0ydxf4odx+E6jL72AtPd5tf81l7OQifeKvvM7Oiv8WZE7q
         8IzSgO83VYU1j8xwIGkE3joMc7nQ+vKlg0O0WORPxEJJSaiM/+slHs70KUqAojnxWHK7
         TJOA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVkt6dawl4p+TGUfEbwDcB25aRaU9nS+lvJzxydGZ/GH6Ks5Tu3LM77H2mAoGJr+DJETXaye5UPLKl9x/E=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyu3tFtyq8OtPagdblmTwjqvGgJXvLxuNCZhqu7Fe+qDZi8Cd6W
	bYsFvA2HHPZmTP4uFFCnRx7STphZCjDssAI6JYFswX5Nd5SZ0d23JVVhZOCWFrg=
X-Gm-Gg: ASbGncufgv/2IDqYhHLoHK0DdVK1yS0yA6YpQwjF7NqWFrsm7HJA40cSUnCuks38brH
	Uzs4CHtGsHWNXiBelsD7yoYiTtlWn/lDHQIFOlFo1g4wix1uAq82bFtuPUB1GmHYqYPHWlZHcQB
	yUUC7cwCEBiPgkVG49XxkDxOHJz60g3Yya743ldlKw0U7hVUzcyMYcOPl+1ESYMFojJC0EpV9L5
	9vdCtRukQj9XeAKA56+g5WAGclYeAvs/kq8oemADTfKy44ULCQxGuucm5QLqFShYNeriDhPE8pU
	Pg==
X-Google-Smtp-Source: 
 AGHT+IEmR2/TmkoFR9idBNEPfQquu8zyJPeOqwMAgJpW3/scT3Z5dWwlgX9AEyJpUkJHqzG2L9+0eQ==
X-Received: by 2002:a05:600c:1e1c:b0:434:effb:9f8a with SMTP id
 5b1f17b1804b1-43912e54246mr23938815e9.15.1738839308376;
        Thu, 06 Feb 2025 02:55:08 -0800 (PST)
Received: from localhost ([2a03:2880:31ff::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-38dbde1dc1dsm1375270f8f.87.2025.02.06.02.55.07
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:07 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 23/26] bpf: Handle allocation failure in
 acquire_lock_state
Date: Thu,  6 Feb 2025 02:54:31 -0800
Message-ID: <20250206105435.2159977-24-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=776; h=from:subject;
 bh=a6nXDtyfaaaN5/sxSs0OZmYxp3ViUzHxtEF2nqEtxP4=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRndCMDTSB5KQHWglo8M0qqlVd8f6toksUk+2oC
 Ok/Xw+iJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RypokD/
 9GP0/dyt61QvEQ+p/5NGWAGtLJXjdj00R6BxemQXfhh7MvCRx41BKujZPuPCBt02Xog2UeoMlXt04D
 RrFCxgOD/1K7vHK54wFCXnJU1den0sUUyDwfsc7Gj/Z0ifdqhrqr90k3WL27q6L5Ur7ohUWpuvAnjx
 gB0+Y/CJisy/oVjoFzaGqCN1CzvRRmkSSNLrV8Uz5Tf6Op2AvJ0nuER2FdlG/+j0t9YBcFkR3pothS
 zfXRfrBvws9/BRL/R8Z15OeRzU1lgVUPjSL6C9RU+hwgIhYDCQ00BoTuMlZh511ojo4sTo3/Zvtf16
 9+lkVBS8sPO5bap8JYLBHDl0nLnFTfxJMfDyUryqKVYV3GSM7Y86ST04cPqZ8EhWprpNw+iKmOmCKd
 gjHx0Q4VcPzA5uy+jmFPpNNMXyT9dmIjrev7NGi5DbvK/DNdjUYWQxJjzN02iC7/EvRTjngcxoWyPe
 rUBoOvzWn7iv0hRVbYWfJBakloSkfB7eIVP2e7rSgqjtytPi/PjYxdl90c9TqChCBlyDqVB8XPQ1gq
 y0eDqUv8LC+trmtM6Wr6odaXyc9aKb9guEkmSooa05h3a0/ZFg3aOhTfFbDmyB8nOJC/rMKNjw4sQ7
 8dmYH22uTb9/O8oVVtMMomG3b8H8m0yE9UTVXLKxcqjDa8wUn5Y2vIIHFqcQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

The acquire_lock_state function needs to handle possible NULL values
returned by acquire_reference_state, and return -ENOMEM.

Fixes: 769b0f1c8214 ("bpf: Refactor {acquire,release}_reference_state")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9971c03adfd5..d6999d085c7d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1501,6 +1501,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	struct bpf_reference_state *s;
 
 	s = acquire_reference_state(env, insn_idx);
+	if (!s)
+		return -ENOMEM;
 	s->type = type;
 	s->id = id;
 	s->ptr = ptr;

From patchwork Thu Feb  6 10:54:32 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962831
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92A6B233135;
	Thu,  6 Feb 2025 10:55:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839315; cv=none;
 b=MVRUEfKzHRTz3/GT4c/AXOrCd0F8MRbXH5zL2dpX1TWbnlipr6AMmwpWPKJEZZEpNsOnOgpIYoRnavllSIkZC4XHLkm9VxjH1ma/fnUdd0tTW1fHrptBim/9utuBu+gXrmz2o02K5bm7vqmzoKbM5baJmHt7rss6tRJFyaLSsdc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839315; c=relaxed/simple;
	bh=5HGNbFTsma4FLf8lx8r/wItLdfJjrA3dk1lFblbB38k=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=lBtZu72inXXla+2hc+U8jPvSmeYEE5N5S4Gv8096mvspx7IjE2ylpdwtYhxZGBGVuNtqM8O460UqXQOVwCn+4OcVnEral7tBaBCmyhUNASb2PyKJRex14zfsUxp77YjLNXKySxNHzW19CAs7qCBHViS2GsNJBrx/+g9nMYa+ARQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=hJz2sWzl; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="hJz2sWzl"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43622267b2eso7537335e9.0;
        Thu, 06 Feb 2025 02:55:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839310; x=1739444110;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=RabZcLwVRQNGP0BmugyphEg76jVqJJBXW2DB0rYNNAY=;
        b=hJz2sWzlwcy+vIcpiBmTS/ZV8l89f09JmaOMfT0qbjbEMtNnIGkwCtSHX0n7kLCObH
         IwOuyXYb11kS4ZeXSqle7AcUGtogUC6/mZcERdFqGg4GhyzJcnyLFEkjJfdLWZdLZv6E
         SR8Bgfk95X3P0TQQ0LqwC5uSoExY29cCT4mW+qV6SSXwtzh0bW/+QFzfFGRsSrsk+IEy
         RcQLA8uczp++aQFgtPKBYRA5lwOsxTJOiappp1HPIxaOkz/Xs0rb9TTscqvnundfm+9r
         +L3T5+t+kXV7UbwGnmGFx53CY/nxyFbp1voCB+rpVEX/gDS7QL02o5c8W1wxtk5OKYcA
         Vzcw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839310; x=1739444110;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=RabZcLwVRQNGP0BmugyphEg76jVqJJBXW2DB0rYNNAY=;
        b=qk8o+sT7MknuxIEXq8/A82cbEfw5feBkaGi3jNOZzcSTszj6yDxiN/dV5WwfXBcWgN
         SQbHO5I6RqkEPqkzBkkLqYzoJOtcPuTaNODsQepY7BswKwaG/AasRn/vVOAfayQusptM
         9n9BrMEuLn1Y6Zg4RcQDxFz3FOoS7ddM8tMuvGsS8jcLOBodwsfSebszm7+HhrpJXVEi
         V2nOXz2vWr0zMVQPxw24xxVqseRaaUx8MeV9qOmMh33oTjT5RzeGkp/M2PseGOlvqCWp
         gYRlXxlrTQxekPoZZDarBPWuaCNfb54ZfzfFUPIv6gKhnKTuPPAB5u2ZHRlPiPGiBlHI
         Go0Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCWK0j7DNuDM8RzE7MjAaMWAQIBjDbiUAS8UTGxgeMCCSi9FPdJ7p15IuF4iklI0GXtVz25KMncRMepSEJo=@vger.kernel.org
X-Gm-Message-State: AOJu0YzaAYaBOkNgozaEwwppxNu+kJZ2+gb74mfqwvZ8PFC2yQcAlLFB
	KOl1EatW28z6lwPFi/8DogGCcKNmC2s1vBJqmfwmaq+9RyhkNmX7TzzOOLoewpY=
X-Gm-Gg: ASbGncvfFzSrsUiR32ivtreZzYyIWl2nsKK5rByttZ2kRMJkwVMzVk+mnhlKur1AyTt
	Ct5xY2XfqhVZVUNf+bpFZNfvB88UqWaCKN7h8BWnvZWzdT2w+cHhoWMpVq7UM1J1vi4oUyqZYrj
	Vvwez05LvhltOvdetianqrKn21V4Dkm5+HukMPpHVaQm6WPlHj/KW4ZGG387MSgCb2wWfYU6S4+
	kJ62BoDaMxThUUPpwmDRdJPSR5MIMm7cMDntIli7cRRouc9LFtoF6zvnCf1a021qVFY2uRq8a3A
	hqs3
X-Google-Smtp-Source: 
 AGHT+IFIYCB/oajw2Qvxf5XpXM6y+pT2gnKN3xko42QpRO/uNhJ36s5dySnRLck/L268tkP6cbHLvg==
X-Received: by 2002:a05:600c:1f8f:b0:434:ff30:a159 with SMTP id
 5b1f17b1804b1-4390d34b326mr53742055e9.0.1738839310094;
        Thu, 06 Feb 2025 02:55:10 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:7::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d94d7d4sm52313755e9.10.2025.02.06.02.55.08
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:09 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for
 rqspinlock
Date: Thu,  6 Feb 2025 02:54:32 -0800
Message-ID: <20250206105435.2159977-25-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=28449; h=from:subject;
 bh=5HGNbFTsma4FLf8lx8r/wItLdfJjrA3dk1lFblbB38k=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnui/XZ+ItES1wv9fM/NvhMZxRq5RhAoodvGS2
 ZRuffZCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RysFCD/
 4gVuhWUETLd0qMcRO+R4F5+AN7eVpScQItfzUD/5wd0Q+cj3KteCL9DvZqOyGrY3QAIR9RZYqSaNDL
 O/mIcF2Pvh6Cd4XW0HW9DX1kZTMtaNAX5y3Gf4MN5ec+3TRpze/Akrti+WPYsH6c2uLrYBQWRxTn1n
 2nmXOnx8l0z9/hL0IiTo9+NU8hXMfYKyowF3WlbLb+qXzVAGbBgY8ujOAxHz7ChloIpJJycv6bt2px
 X+0gww/ScR2tiGeGccRHwjsv9A5aZwmf1rEgR+JxNWUDVmw5Q1bxLfcENHtdkmaY2LUSJRhbC2azIk
 Mhp1Qxtm4cKs6uDuXxuD6BC31t9Wh1sjAXDUHkkVzvmozKjq+iJaJfmq9lBIKIxm+F4VQ0IJeIu+ht
 ITxR8QbwqKVAaSvF4dtUZVj/rt0YUiXr3qhkIJkeJhVFHEIGyPLT8vAn1UxElhe76/uoOf0cEi/Fap
 Mu9etskGVBz22A+IZ3TPuAW/fi1WDtm/ORrB0Zppv7MWJt7QtJCxeRUdjMZUW19IozwqeH8oSDnccR
 Z6lXr96lJ79xOgOwGOPUFv9vqUOPLF2PHh7pe2TML+07+Q6rcQrxCQ51c2qz/+mhe7r7mGe3DzoB24
 Y2voqsR/rq+xaXg/ZTLcx0T4rvjTs+U7b+M/9ldEAImGeezuk/XkRjwJxhSA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce verifier-side support for rqspinlock kfuncs. The first step is
allowing bpf_res_spin_lock type to be defined in map values and
allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
field to recognize and validate.

Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
one of them (and at most one of them per-object, like before) must be
present. The bpf_res_spin_lock can also be used to protect objects that
require lock protection for their kfuncs, like BPF rbtree and linked
list.

The verifier plumbing to simulate success and failure cases when calling
the kfuncs is done by pushing a new verifier state to the verifier state
stack which will verify the failure case upon calling the kfunc. The
path where success is indicated creates all lock reference state and IRQ
state (if necessary for irqsave variants). In the case of failure, the
state clears the registers r0-r5, sets the return value, and skips kfunc
processing, proceeding to the next instruction.

When marking the return value for success case, the value is marked as
0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
the verifier never traverses such branches for success cases, and would
be aware that the lock is not held in such cases.

We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
are invoked. We introduce a kfunc_class state to avoid mixing lock
irqrestore kfuncs with IRQ state created by bpf_local_irq_save.

With all this infrastructure, these kfuncs become usable in programs
while satisfying all safety properties required by the kernel.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
---
 include/linux/bpf.h          |   9 ++
 include/linux/bpf_verifier.h |  17 ++-
 kernel/bpf/btf.c             |  26 ++++-
 kernel/bpf/syscall.c         |   6 +-
 kernel/bpf/verifier.c        | 219 ++++++++++++++++++++++++++++-------
 5 files changed, 232 insertions(+), 45 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35af09ee6a2c..91dddf7396f9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -205,6 +205,7 @@ enum btf_field_type {
 	BPF_REFCOUNT   = (1 << 9),
 	BPF_WORKQUEUE  = (1 << 10),
 	BPF_UPTR       = (1 << 11),
+	BPF_RES_SPIN_LOCK = (1 << 12),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -240,6 +241,7 @@ struct btf_record {
 	u32 cnt;
 	u32 field_mask;
 	int spin_lock_off;
+	int res_spin_lock_off;
 	int timer_off;
 	int wq_off;
 	int refcount_off;
@@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return "bpf_spin_lock";
+	case BPF_RES_SPIN_LOCK:
+		return "bpf_res_spin_lock";
 	case BPF_TIMER:
 		return "bpf_timer";
 	case BPF_WORKQUEUE:
@@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return sizeof(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return sizeof(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return sizeof(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return __alignof__(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return __alignof__(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return __alignof__(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_RB_ROOT:
 		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_KPTR_UNREF:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 32c23f2a3086..ed444e44f524 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -115,6 +115,15 @@ struct bpf_reg_state {
 			int depth:30;
 		} iter;
 
+		/* For irq stack slots */
+		struct {
+			enum {
+				IRQ_KFUNC_IGNORE,
+				IRQ_NATIVE_KFUNC,
+				IRQ_LOCK_KFUNC,
+			} kfunc_class;
+		} irq;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -255,9 +264,11 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR	= 1,
-		REF_TYPE_IRQ	= 2,
-		REF_TYPE_LOCK	= 3,
+		REF_TYPE_PTR		= (1 << 1),
+		REF_TYPE_IRQ		= (1 << 2),
+		REF_TYPE_LOCK		= (1 << 3),
+		REF_TYPE_RES_LOCK 	= (1 << 4),
+		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 9433b6467bbe..aba6183253ea 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3480,6 +3480,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 			goto end;
 		}
 	}
+	if (field_mask & BPF_RES_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_res_spin_lock")) {
+			if (*seen_mask & BPF_RES_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_RES_SPIN_LOCK;
+			type = BPF_RES_SPIN_LOCK;
+			goto end;
+		}
+	}
 	if (field_mask & BPF_TIMER) {
 		if (!strcmp(name, "bpf_timer")) {
 			if (*seen_mask & BPF_TIMER)
@@ -3658,6 +3667,7 @@ static int btf_find_field_one(const struct btf *btf,
 
 	switch (field_type) {
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_LIST_NODE:
@@ -3951,6 +3961,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		return ERR_PTR(-ENOMEM);
 
 	rec->spin_lock_off = -EINVAL;
+	rec->res_spin_lock_off = -EINVAL;
 	rec->timer_off = -EINVAL;
 	rec->wq_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
@@ -3978,6 +3989,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			/* Cache offset for faster lookup at runtime */
 			rec->spin_lock_off = rec->fields[i].offset;
 			break;
+		case BPF_RES_SPIN_LOCK:
+			WARN_ON_ONCE(rec->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			rec->res_spin_lock_off = rec->fields[i].offset;
+			break;
 		case BPF_TIMER:
 			WARN_ON_ONCE(rec->timer_off >= 0);
 			/* Cache offset for faster lookup at runtime */
@@ -4021,9 +4037,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
+	if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	/* bpf_{list_head, rb_node} require bpf_spin_lock */
 	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
-	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
+	     btf_record_has_field(rec, BPF_RB_ROOT)) &&
+		 (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -5636,7 +5658,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
 						  BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT |
 						  BPF_KPTR, t->size);
 		/* The record cannot be unset, treat it as an error if so */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c420edbfb7c8..054707215d28 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 			break;
 		case BPF_TIMER:
 			bpf_timer_cancel_and_free(field_ptr);
@@ -1203,7 +1206,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
@@ -1222,6 +1225,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case 0:
 				continue;
 			case BPF_SPIN_LOCK:
+			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d6999d085c7d..294761dd0072 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
+	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1148,7 +1148,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id);
 
 static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 				     struct bpf_kfunc_call_arg_meta *meta,
-				     struct bpf_reg_state *reg, int insn_idx)
+				     struct bpf_reg_state *reg, int insn_idx,
+				     int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1170,6 +1171,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
 	st->live |= REG_LIVE_WRITTEN;
 	st->ref_obj_id = id;
+	st->irq.kfunc_class = kfunc_class;
 
 	for (i = 0; i < BPF_REG_SIZE; i++)
 		slot->slot_type[i] = STACK_IRQ_FLAG;
@@ -1178,7 +1180,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				      int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1192,6 +1195,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r
 	slot = &state->stack[spi];
 	st = &slot->spilled_ptr;
 
+	if (kfunc_class != IRQ_KFUNC_IGNORE && st->irq.kfunc_class != kfunc_class) {
+		const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+		const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+
+		verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n",
+			flag_kfunc, used_kfunc);
+		return -EINVAL;
+	}
+
 	err = release_irq_state(env->cur_state, st->ref_obj_id);
 	WARN_ON_ONCE(err && err != -EACCES);
 	if (err) {
@@ -1591,7 +1603,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type != type)
+		if (!(s->type & type))
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -7985,6 +7997,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
 	return err;
 }
 
+enum {
+	PROCESS_SPIN_LOCK = (1 << 0),
+	PROCESS_RES_LOCK  = (1 << 1),
+	PROCESS_LOCK_IRQ  = (1 << 2),
+};
+
 /* Implementation details:
  * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL.
  * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL.
@@ -8007,30 +8025,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
-static int process_spin_lock(struct bpf_verifier_env *env, int regno,
-			     bool is_lock)
+static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 {
+	bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK;
+	const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin";
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
+	bool is_irq = flags & PROCESS_LOCK_IRQ;
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
 	struct btf_record *rec;
+	u32 spin_lock_off;
 	int err;
 
 	if (!is_const) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s_lock has to be at the constant offset\n",
+			regno, lock_str);
 		return -EINVAL;
 	}
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		map = reg->map_ptr;
 		if (!map->btf) {
 			verbose(env,
-				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-				map->name);
+				"map '%s' has to have BTF in order to use %s_lock\n",
+				map->name, lock_str);
 			return -EINVAL;
 		}
 	} else {
@@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	}
 
 	rec = reg_btf_record(reg);
-	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
-		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
-			map ? map->name : "kptr");
+	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr", lock_str);
 		return -EINVAL;
 	}
-	if (rec->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, rec->spin_lock_off);
+	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
+	if (spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
+			val + reg->off, lock_str, spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
 		else
 			ptr = btf;
 
-		if (cur->active_locks) {
-			verbose(env,
-				"Locking two bpf_spin_locks are not allowed\n");
-			return -EINVAL;
+		if (!is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
+				verbose(env,
+					"Locking two bpf_spin_locks are not allowed\n");
+				return -EINVAL;
+			}
+		} else if (is_res_lock) {
+			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
+				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
+				return -EINVAL;
+			}
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
 		}
 	} else {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
@@ -8075,12 +8113,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			ptr = btf;
 
 		if (!cur->active_locks) {
-			verbose(env, "bpf_spin_unlock without taking a lock\n");
+			verbose(env, "%s_unlock without taking a lock\n", lock_str);
 			return -EINVAL;
 		}
 
-		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
-			verbose(env, "bpf_spin_unlock of different lock\n");
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		if (release_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
 		}
 
@@ -9391,11 +9435,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
-			err = process_spin_lock(env, regno, true);
+			err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK);
 			if (err)
 				return err;
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
-			err = process_spin_lock(env, regno, false);
+			err = process_spin_lock(env, regno, 0);
 			if (err)
 				return err;
 		} else {
@@ -11274,7 +11318,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].map_uid = meta.map_uid;
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
 		if (!type_may_be_null(ret_flag) &&
-		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) {
+		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		break;
@@ -11446,10 +11490,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 /* mark_btf_func_reg_size() is used when the reg size is determined by
  * the BTF func_proto's return value size and argument.
  */
-static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
-				   size_t reg_size)
+static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs,
+				     u32 regno, size_t reg_size)
 {
-	struct bpf_reg_state *reg = &cur_regs(env)[regno];
+	struct bpf_reg_state *reg = &regs[regno];
 
 	if (regno == BPF_REG_0) {
 		/* Function return value */
@@ -11467,6 +11511,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
+static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
+				   size_t reg_size)
+{
+	return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size);
+}
+
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_ACQUIRE;
@@ -11604,6 +11654,7 @@ enum {
 	KF_ARG_RB_ROOT_ID,
 	KF_ARG_RB_NODE_ID,
 	KF_ARG_WORKQUEUE_ID,
+	KF_ARG_RES_SPIN_LOCK_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -11613,6 +11664,7 @@ BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
 BTF_ID(struct, bpf_rb_node)
 BTF_ID(struct, bpf_wq)
+BTF_ID(struct, bpf_res_spin_lock)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -11661,6 +11713,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
 }
 
+static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID);
+}
+
 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
 				  const struct btf_param *arg)
 {
@@ -11732,6 +11789,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
+	KF_ARG_PTR_TO_RES_SPIN_LOCK,
 };
 
 enum special_kfunc_type {
@@ -11768,6 +11826,10 @@ enum special_kfunc_type {
 	KF_bpf_iter_num_new,
 	KF_bpf_iter_num_next,
 	KF_bpf_iter_num_destroy,
+	KF_bpf_res_spin_lock,
+	KF_bpf_res_spin_unlock,
+	KF_bpf_res_spin_lock_irqsave,
+	KF_bpf_res_spin_unlock_irqrestore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11846,6 +11908,10 @@ BTF_ID(func, bpf_local_irq_restore)
 BTF_ID(func, bpf_iter_num_new)
 BTF_ID(func, bpf_iter_num_next)
 BTF_ID(func, bpf_iter_num_destroy)
+BTF_ID(func, bpf_res_spin_lock)
+BTF_ID(func, bpf_res_spin_unlock)
+BTF_ID(func, bpf_res_spin_lock_irqsave)
+BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11939,6 +12005,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_IRQ_FLAG;
 
+	if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RES_SPIN_LOCK;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -12046,13 +12115,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 			     struct bpf_kfunc_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int err, kfunc_class = IRQ_NATIVE_KFUNC;
 	bool irq_save;
-	int err;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) {
 		irq_save = true;
-	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+			kfunc_class = IRQ_LOCK_KFUNC;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] ||
+		   meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) {
 		irq_save = false;
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+			kfunc_class = IRQ_LOCK_KFUNC;
 	} else {
 		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 		return -EFAULT;
@@ -12068,7 +12143,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class);
 		if (err)
 			return err;
 	} else {
@@ -12082,7 +12157,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = unmark_stack_slot_irq_flag(env, reg);
+		err = unmark_stack_slot_irq_flag(env, reg, kfunc_class);
 		if (err)
 			return err;
 	}
@@ -12209,7 +12284,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
+			    id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12245,9 +12321,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl];
 }
 
+static bool is_bpf_res_spin_lock_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore];
+}
+
 static bool kfunc_spin_allowed(u32 btf_id)
 {
-	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id);
+	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) ||
+	       is_bpf_res_spin_lock_kfunc(btf_id);
 }
 
 static bool is_sync_callback_calling_kfunc(u32 btf_id)
@@ -12679,6 +12764,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -12977,6 +13063,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
+		{
+			int flags = PROCESS_RES_LOCK;
+
+			if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d doesn't point to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_bpf_res_spin_lock_kfunc(meta->func_id))
+				return -EFAULT;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+				flags |= PROCESS_SPIN_LOCK;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+				flags |= PROCESS_LOCK_IRQ;
+			ret = process_spin_lock(env, regno, flags);
+			if (ret < 0)
+				return ret;
+			break;
+		}
 		}
 	}
 
@@ -13062,6 +13170,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	insn_aux->is_iter_next = is_iter_next_kfunc(&meta);
 
+	if (!insn->off &&
+	    (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	     insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+		struct bpf_verifier_state *branch;
+		struct bpf_reg_state *regs;
+
+		branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
+		if (!branch) {
+			verbose(env, "failed to push state for failed lock acquisition\n");
+			return -ENOMEM;
+		}
+
+		regs = branch->frame[branch->curframe]->regs;
+
+		/* Clear r0-r5 registers in forked state */
+		for (i = 0; i < CALLER_SAVED_REGS; i++)
+			mark_reg_not_init(env, regs, caller_saved[i]);
+
+		mark_reg_unknown(env, regs, BPF_REG_0);
+		err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1);
+		if (err) {
+			verbose(env, "failed to mark s32 range for retval in forked state for lock\n");
+			return err;
+		}
+		__mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32));
+	}
+
 	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
 		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
 		return -EACCES;
@@ -13232,6 +13367,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+		    meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]))
+			__mark_reg_const_zero(env, &regs[BPF_REG_0]);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
@@ -18114,7 +18252,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 		case STACK_IRQ_FLAG:
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
-			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class)
 				return false;
 			break;
 		case STACK_MISC:
@@ -18158,6 +18297,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
+		case REF_TYPE_RES_LOCK:
+		case REF_TYPE_RES_LOCK_IRQ:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
 				return false;
 			break;
@@ -19491,7 +19632,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
+	if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;

From patchwork Thu Feb  6 10:54:33 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962830
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C86B227560;
	Thu,  6 Feb 2025 10:55:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839315; cv=none;
 b=hWK5G6+b1/X2Hs5nl+nYA3QwhNZP4UeBJJd+cFyxKgEWAi+Nj47uvD09bYk3BBbYnTy6GdaVXiUjrzuX5JjoYdIoFixi7T+CAHYI9x3TUoRBDWFP9r/2RA4CxqRjsMvgpNBRLDbJznLjgx/bwaYZIHgZCighZxRkmeil+zV0ipI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839315; c=relaxed/simple;
	bh=CpkXgirFn/osRo9yRzXQj0S5SI5kmNKbptLkawFmaSc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=qLNkidODfjHHfa7906zVR1oJN59BEXKfG8wj3uDaWlpzSpnipSOWLeIKwPOi+noo1X+DlovsglCyhxwte2K8jsUxKf6zZ50bWL/hGkp1xwgfXs6O9/N0x1V1y7eI2bBdi+CoUZl64z3Q5oxaC/jWZhy3c7BlvHAMrYFI9ZMR5ZU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=aqvz62n+; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="aqvz62n+"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-4361815b96cso4771885e9.1;
        Thu, 06 Feb 2025 02:55:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839311; x=1739444111;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bEAtfk9sZ7vuzt+HmDZCm4BnfnO/7nvi4Al6O9lbnV0=;
        b=aqvz62n+cISyS51gPBqfK3FuRwPLR+5fpzyvuBIGJNSHPJ1wUpRrJflBRhPjONNwko
         2lEFyb7H01XM2fr70NclNQzAmKNomcERAMNy9hOP+dvanKP1SqIZcXuJNBe6waU7MLOv
         PWF3EtlGe1tz6PTs8wuI6JdWzZq09yBZ7s5Y1XsBxfPEFvpUFny5LAZM/nUA1kUgs0QS
         L+Ln/4/brMtYN9Siq3WJAsJWSOOoO3RUXLYYMJV+XBLSxg3Jf56bsXcQqc5B74mAGXgS
         jS+eBjCzDDoU+aVqLGWIw1NWp01iRoAUjuPTpfz+h5BpBxjhZLKK2XjGxx5YMh4O03/Q
         C/cw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839311; x=1739444111;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bEAtfk9sZ7vuzt+HmDZCm4BnfnO/7nvi4Al6O9lbnV0=;
        b=q7d8nyO2WDpdkI/rZSWOTCe5KXU+fWc0+BAjnTJkYN0OODQvWIoEBzAtTgCIgiOjCr
         LF1Ok/8TqcVSmAfNjhaypjxfjElkDdfAlwAqd5kH+fBk3GBWPvyLQbwQJq5HpJr5Agr0
         yrwEeG2XHrI4x8lpoycVsWWp4Xe+azEVdXUzkHIIkDQXpakqhYlt0Hw8MYVq2yNp21MW
         QIC2Mbm/GXzr094A/N4j993Jtifc2dtuCKHQnTdjyGXi5JpMS3SBOb4qBugL2fJBp58U
         Oiu30koA57DTZrzd3AqHGHCC5clbZKT/W+h2Tpq8qaHFj+qpTQGkkx8Xr2AH/agmzeD2
         qEnw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUmVEeP9Qr0hl2pSpmhnu4QHnMGeNavjwWCbjcKaQmamuZxhEo0vhCAx9CqC0lPjBG6TJFQBC9yBLT0Ai0=@vger.kernel.org
X-Gm-Message-State: AOJu0Yx1NEaUZsm+bHVnkjsoH+F0pxXT3T1egO/PH3Ko2wuGeZKvq8+S
	zuUyWK5t7Jii/LG0Fp8rxv0cgqmgr+0SdDDH85LfOEMVuxHDU68PI9ZwfOic7wg=
X-Gm-Gg: ASbGncve1v+FZwtYCMRgdJB9DOet3Lxb8YF0ZFSFQFAlOnvz2bezinAJVXnOszWdaDP
	g8QbL5A6XIHVsWZCu4oIGd8qgcDixm04HZIm8FDSrBPGIx4p8jSp9ZxHQHYGqg/GNl7aXzbkdHw
	YRpXa/Honwml9KtL1JMtERB1dednKGZ32zBeN13A2ih33Cw6tKfndo8QKwG+AnE38xzSoxiLuuR
	wumEFgoJtWmuIab08kOP17J7yUNRtzv4OKFfbfv0w+u+DJHlfAUwh2CAV613rGxuTtCsOpiMY+x
	JS0RSA==
X-Google-Smtp-Source: 
 AGHT+IFqbAZGecNZkCkV4EyCt9uLl+Me6VZBFwE4Ef4f+7NoRNBRql5C5/fdSnEXvukiZ8DRRQsg4A==
X-Received: by 2002:a05:600c:3502:b0:434:a468:4a57 with SMTP id
 5b1f17b1804b1-4390d56d740mr41168565e9.26.1738839311535;
        Thu, 06 Feb 2025 02:55:11 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:18::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d966faasm51736715e9.23.2025.02.06.02.55.10
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:10 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 25/26] bpf: Maintain FIFO property for rqspinlock
 unlock
Date: Thu,  6 Feb 2025 02:54:33 -0800
Message-ID: <20250206105435.2159977-26-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject;
 bh=CpkXgirFn/osRo9yRzXQj0S5SI5kmNKbptLkawFmaSc=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRn9kSW/6/ffD2Y5sJ0f0LySFJbetcpzORHAy69
 oJgq0WmJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RysTTEA
 Cs/dM0FPIuRegO0Ni1ae0PS45twSts6Al635nhtr0pJrXE19WwWgKiImtGQFXqq3EPB5/8SvfuaE8G
 J1P1qGxlxHodgTsNGv+svmEY0/JEu+q38yAWvB4eRw3JLHg9CbMW51ggxE1p3ZpyZYdbpTgg6oxbQE
 tBXlQcMrV6jKRRSW1tnaKkmrC9wWPdweMrStWBRYVKxQBsyiRvRP58LTSpAeAwLr8stkpLRqzpdlUM
 WixOEfnvLFoJZoPEOHf1EBfFZyIThmEomcF7O0RKWuNsk5bXmo/QkqIcglaJT064UNbt+Xcpn2Bdgv
 Q+c7mIOHcVf31dJK9WBMiOB2YeIOcQuybv5/L73vuSoYFc4o2Hj5Rl0C+sRvUP8mIlwf25+jxQvYBP
 4tTP5udWN/wvGI3i3o1VSO5asN9D7U6bGMJ1ZAfO4SBI7Jy7BbBDsa143gR8A2A30Kw9d5HSmip4a0
 DmlaU7qWHdV2XFecXaKQvl/1VrvRH9XWp3ofuOWKzZkjOYKHnOjpcqd9d0JVn9LNE+45UH8SAgmHyl
 VGJ9BQJpdJmgnhnBI4c3ltCVvXs7IkhGfKc0k5MJwxPnQTv+uV0xjj9cFG4MJ77JOXjgkV1tpM1Nka
 70APhaWwVIcUoIfaNj8H3ninXADBCDCYIA8glTB00cVm16MtNyCn/PsFhxVQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Since out-of-order unlocks are unsupported for rqspinlock, and irqsave
variants enforce strict FIFO ordering anyway, make the same change for
normal non-irqsave variants, such that FIFO ordering is enforced.

Two new verifier state fields (active_lock_id, active_lock_ptr) are used
to denote the top of the stack, and prev_id and prev_ptr are ascertained
whenever popping the topmost entry through an unlock.

Take special care to make these fields part of the state comparison in
refsafe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  3 +++
 kernel/bpf/verifier.c        | 33 ++++++++++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index ed444e44f524..92cd2289b743 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -269,6 +269,7 @@ struct bpf_reference_state {
 		REF_TYPE_LOCK		= (1 << 3),
 		REF_TYPE_RES_LOCK 	= (1 << 4),
 		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
+		REF_TYPE_LOCK_MASK	= REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
@@ -435,6 +436,8 @@ struct bpf_verifier_state {
 	u32 active_locks;
 	u32 active_preempt_locks;
 	u32 active_irq_id;
+	u32 active_lock_id;
+	void *active_lock_ptr;
 	bool active_rcu_lock;
 
 	bool speculative;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 294761dd0072..9cac6ea4f844 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1421,6 +1421,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
 	dst->active_preempt_locks = src->active_preempt_locks;
 	dst->active_rcu_lock = src->active_rcu_lock;
 	dst->active_irq_id = src->active_irq_id;
+	dst->active_lock_id = src->active_lock_id;
+	dst->active_lock_ptr = src->active_lock_ptr;
 	return 0;
 }
 
@@ -1520,6 +1522,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	s->ptr = ptr;
 
 	state->active_locks++;
+	state->active_lock_id = id;
+	state->active_lock_ptr = ptr;
 	return 0;
 }
 
@@ -1559,16 +1563,24 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx)
 
 static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
 {
+	void *prev_ptr = NULL;
+	u32 prev_id = 0;
 	int i;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != type)
-			continue;
-		if (state->refs[i].id == id && state->refs[i].ptr == ptr) {
+		if (state->refs[i].type == type && state->refs[i].id == id &&
+		    state->refs[i].ptr == ptr) {
 			release_reference_state(state, i);
 			state->active_locks--;
+			/* Reassign active lock (id, ptr). */
+			state->active_lock_id = prev_id;
+			state->active_lock_ptr = prev_ptr;
 			return 0;
 		}
+		if (state->refs[i].type & REF_TYPE_LOCK_MASK) {
+			prev_id = state->refs[i].id;
+			prev_ptr = state->refs[i].ptr;
+		}
 	}
 	return -EINVAL;
 }
@@ -8123,6 +8135,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 			type = REF_TYPE_RES_LOCK;
 		else
 			type = REF_TYPE_LOCK;
+		if (!find_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
+			return -EINVAL;
+		}
+		if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) {
+			verbose(env, "%s_unlock cannot be out of order\n", lock_str);
+			return -EINVAL;
+		}
 		if (release_lock_state(cur, type, reg->id, ptr)) {
 			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
@@ -12284,8 +12304,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
-			    id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -18288,6 +18307,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
 		return false;
 
+	if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) ||
+	    old->active_lock_ptr != cur->active_lock_ptr)
+		return false;
+
 	for (i = 0; i < old->acquired_refs; i++) {
 		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
 		    old->refs[i].type != cur->refs[i].type)

From patchwork Thu Feb  6 10:54:34 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13962832
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F6EA233D91;
	Thu,  6 Feb 2025 10:55:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1738839317; cv=none;
 b=ZF+s7ZlDbIfSMkTqh53IYowkl+QApPn3dMZZhWJOUthWmo/kWVC7GBeGRB7Pjbcp+pOxDSJLSw/i94rQ6w5tdWRIZNH89taRivvu4krLbMn2R6M8eA+wKea4V4DKeYBkv+UetSNabuIVXukRw8UMFz28UgH3MVRE1GJ1ddzrDVs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1738839317; c=relaxed/simple;
	bh=T/KW/xDA8UwHhiAJo3V+JzDoebizVQBXrYloT3/u9JY=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Ne1hKwAfwR0meYgz9z7D2Mmhw9SObYc4vuJayi7AJP0KPBACg8DrG3ZGxtteJZf0pTAWwxpuiyCfQqMs3Kzo4gvnSx/ASAuMNFWciHu/8iLw5+miVLfoQ6DEywVhiE/BzZvJ4IRpQ1k7zZtcmo6hhb4fznVAZAZ+DFxZtmdYXWA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Y2pLSIy/; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Y2pLSIy/"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-436249df846so4628725e9.3;
        Thu, 06 Feb 2025 02:55:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1738839313; x=1739444113;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=XI5mR9awotpFmMGmD6PyGcm5Yi9TOUmCSpGD+JUgaXI=;
        b=Y2pLSIy/yIvGmi+QmX5ue8cDuzNtvBDEH2Y0IMzeHWbP0UrM6NGYnFP5bDmuJIisRY
         pFZFmH1FTjZbHYP6c3B6BNWM7gjnYGiEqOsZtYuvWmC54vsmV/n6JB93QEVrOjm60eaT
         ktxyJhnfwXRYn53SPSIoGI71xUXAolzDsBRuO4dZyM8qN6QzXaleqa17xDFklRuI9g5B
         Yr7RfMJT7oDpm9oQPm8xY4CYzBXWIztpfLvrvw5d/Ef5pJc83OBggS9PhmP1Lw/qjOfj
         DiEM8r1Y13iGB4BP9RmvP/PR3XSYLb00bZeA2xsNhTpojinQmkrkckiq3XOa01RvGybZ
         1YMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1738839313; x=1739444113;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=XI5mR9awotpFmMGmD6PyGcm5Yi9TOUmCSpGD+JUgaXI=;
        b=jxqky431f5srgqDsSty5NcoNYfu5VcHhsZL4qE1/T0TwzFjG2jgy6jOh6FygaQ+cBt
         OOzaJ2u1WDb4OhSYmtbNxIXmq3jsSR2lSXfy2T69+mgJkVfxiRnauL3RrbSveAkcdKvi
         q3TDCjg+LWf6XRBkYDJXUczlMAdDZbwtpQDSGqOufa6byqv0DdMl63jIdnAiRedywGSj
         1bDSp8zjfaLkH7joIcQ6/bYbMUL34jEfeKVY5Vej65TMQTlIvw0nTOhRW+p5lYrBMnhY
         nh60BNYAlNNWcspKjf6klC13rUBTOVSJha3fe5LCxWxGfcUv70XtMr8zb9pHxQSmnOQD
         HAfw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWd7af2xupI1JVwz+zCysqwKfevAMKbzL3NnnbrdjYATGtoPa0d+0b2cbFkH2rGBdpxoELYBmJDm4FrZMQ=@vger.kernel.org
X-Gm-Message-State: AOJu0YxdcXuYO7K1VgdthTt6tQrwsqqetrM2JpMNWPtMsAWOmKhYwF4f
	tAdTpOvGsWh7Z+FINlpgo7ZLzs7oV8N3OcsZ8ZXeHixHLNfuIirIm7SXnQTs8Nw=
X-Gm-Gg: ASbGncv7dWvuNROvIxwY0jVRSvyPhcxMATaXi6JfP/xXiah/IKpRGv+OHKq2LZyLT/K
	UIA1LJtYny7hg+JT8V+2bHpRZqb5ati4arUz4OrK/MFE+5/8R/aYD3QuAHBmy+EGhfrwGXOKbPX
	o7A8tvZKLd6raud3OY/ROBGc6wyRv5cp8g4gmm5WWncTOPIgtkt9pojJVfBlUtMlSyMw1RUQJQw
	zwbn7UekmyE7OZWIPAk8MFQDcB0s/FNSMq9B96F+0koEuKcvwTQ0tc8xHVS5bCjJ44RqZC2Dlaz
	m353yA==
X-Google-Smtp-Source: 
 AGHT+IHxQpWySsSSj+am/9movA5RR05zhmTuyQQ7kRFsu9/Thv/ZrVIRrKuBYgF71sqgdttaSpcNlA==
X-Received: by 2002:a05:600c:1d01:b0:434:f335:855 with SMTP id
 5b1f17b1804b1-4390d5a3b1amr46396125e9.28.1738839312912;
        Thu, 06 Feb 2025 02:55:12 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1e::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4390d9334e7sm50903345e9.6.2025.02.06.02.55.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Feb 2025 02:55:12 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kernel-team@meta.com
Subject: [PATCH bpf-next v2 26/26] selftests/bpf: Add tests for rqspinlock
Date: Thu,  6 Feb 2025 02:54:34 -0800
Message-ID: <20250206105435.2159977-27-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com>
References: <20250206105435.2159977-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=15931; h=from:subject;
 bh=T/KW/xDA8UwHhiAJo3V+JzDoebizVQBXrYloT3/u9JY=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnrlHnn/tBJ6oJ5t/DtGyjE2XhsWglV3leICP3
 GP9uxzCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RygPCEA
 C9S+AZeaRDMKdILDKrL+7frf+PUkXKd4iu4zWS7lzhMl7sqS4Nh9t1j2+axKLBCR5GW16UadRkev9l
 4AqI35PX5zJhmHuS98mojbAK0CpS2CGHXnEXEpFuaCyNsa1YcLtCp8cDsiMtko27s7eQKM+0JRXgV+
 AHn3SqCKzv/56wYO8tQGN0xPhYGQyIQ358ZD+GyyR3NdWiQb7vmIK5qkZqGMpNOL1HQvR8gRQR6ptA
 1PW4+UutQskXBQHy4BPCv1jc6WKr+5Dd2aa0c1PBdcGQ7fUZs9/wTKkNT+FDdNx7MlEYXg8Woa9RaX
 14OIHH8WAYCA+BnP1Y20WdBptw0RU82vmXwQ5iwUyfjNyUeysFqk1dyT4ly8Wbg5CV4CXoBTsnyN6Y
 Sx5LJXb4UWDkzQm5GxqC7CXVXeFuV7ziqEu7dPwt7sqeT/WA9HEesT7Sc7e8S5gWuSULdoaCcKXMMz
 TxZWf652+WNHPkKtVodaoP4JvTJc8Vy2F+Li5/I7kMaVt2ZkqX18Qvf/4YXl2tJ/gWb6R9YYamNkwq
 EMJZ+zkvqv4Il+pSYLhiaoVY+mwAId1+LWNwv6b8UDTr5yVtRGYkxuTFGQWj7zgc3fxNJAX/9H6jHi
 OlLJLdnUrJZafds/4uVLPPoZVywAm415gF0IkMsXK0ujQlvIw6wx9VqTeeog==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  |  99 +++++++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++++
 .../selftests/bpf/progs/res_spin_lock.c       | 143 ++++++++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 244 ++++++++++++++++++
 4 files changed, 539 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
new file mode 100644
index 000000000000..5a46b3e4a842
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "res_spin_lock.skel.h"
+#include "res_spin_lock_fail.skel.h"
+
+static void test_res_spin_lock_failure(void)
+{
+	RUN_TESTS(res_spin_lock_fail);
+}
+
+static volatile int skip;
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	while (!READ_ONCE(skip)) {
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "test_run");
+		ASSERT_OK(topts.retval, "test_run retval");
+	}
+	pthread_exit(arg);
+}
+
+static void test_res_spin_lock_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct res_spin_lock *skel;
+	pthread_t thread_id[16];
+	int prog_fd, i, err;
+	void *ret;
+
+	skel = res_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load"))
+		return;
+	/* AA deadlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	/* Multi-threaded ABBA deadlock. */
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB);
+	for (i = 0; i < 16; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	topts.repeat = 1000;
+	int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA);
+	while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) {
+		err = bpf_prog_test_run_opts(fd, &topts);
+	}
+
+	WRITE_ONCE(skip, true);
+
+	for (i = 0; i < 16; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+
+	ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err");
+	ASSERT_OK(err, "err");
+	ASSERT_EQ(topts.retval, -EDEADLK, "timeout");
+end:
+	res_spin_lock__destroy(skel);
+	return;
+}
+
+void test_res_spin_lock(void)
+{
+	if (test__start_subtest("res_spin_lock_success"))
+		test_res_spin_lock_success();
+	if (test__start_subtest("res_spin_lock_failure"))
+		test_res_spin_lock_failure();
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index b0b53d980964..3d4fee83a5be 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
 extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
 extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
 
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
 SEC("?tc")
 __failure __msg("arg#0 doesn't point to an irq flag on stack")
 int irq_save_bad_arg(struct __sk_buff *ctx)
@@ -441,4 +444,54 @@ int irq_ooo_refs_array(struct __sk_buff *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_ooo_lock_cond_inv(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) {
+		bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+		return 0;
+	}
+
+	bpf_res_spin_unlock_irqrestore(&lockB, &flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_1(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	/* For now, bpf_local_irq_restore is not allowed in critical section,
+	 * but this test ensures error will be caught with kfunc_class when it's
+	 * opened up. Tested by temporarily permitting this kfunc in critical
+	 * section.
+	 */
+	bpf_local_irq_restore(&flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	bpf_local_irq_save(&flags1);
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags2))
+		return 0;
+	bpf_local_irq_restore(&flags2);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c
new file mode 100644
index 000000000000..f68aa2ccccc2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define EDEADLK 35
+#define ETIMEDOUT 110
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
+SEC("tc")
+int res_spin_lock_test(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return -1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem2)
+		return -1;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (!bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem2->lock);
+		bpf_res_spin_unlock(&elem1->lock);
+		return -1;
+	}
+	bpf_res_spin_unlock(&elem1->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_AB(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockA);
+	if (r)
+		return !r;
+	/* Only unlock if we took the lock. */
+	if (!bpf_res_spin_lock(&lockB))
+		bpf_res_spin_unlock(&lockB);
+	bpf_res_spin_unlock(&lockA);
+	return 0;
+}
+
+int err;
+
+SEC("tc")
+int res_spin_lock_test_BA(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockB);
+	if (r)
+		return !r;
+	if (!bpf_res_spin_lock(&lockA))
+		bpf_res_spin_unlock(&lockA);
+	else
+		err = -EDEADLK;
+	bpf_res_spin_unlock(&lockB);
+	return err ?: 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx)
+{
+	struct bpf_res_spin_lock *locks[48] = {};
+	struct arr_elem *e;
+	u64 time_beg, time;
+	int ret = 0, i;
+
+	_Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 32,
+		       "RES_NR_HELD assumed to be 32");
+
+	for (i = 0; i < 34; i++) {
+		int key = i;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	for (; i < 48; i++) {
+		int key = i - 2;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	time_beg = bpf_ktime_get_ns();
+	for (i = 0; i < 34; i++) {
+		if (bpf_res_spin_lock(locks[i]))
+			goto end;
+	}
+
+	/* Trigger AA, after exhausting entries in the held lock table. This
+	 * time, only the timeout can save us, as AA detection won't succeed.
+	 */
+	if (!bpf_res_spin_lock(locks[34])) {
+		bpf_res_spin_unlock(locks[34]);
+		ret = 1;
+		goto end;
+	}
+
+end:
+	for (i = i - 1; i >= 0; i--)
+		bpf_res_spin_unlock(locks[i]);
+	time = bpf_ktime_get_ns() - time_beg;
+	/* Time spent should be easily above our limit (1/2 s), since AA
+	 * detection won't be expedited due to lack of held lock entry.
+	 */
+	return ret ?: (time > 1000000000 / 2 ? 0 : 1);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
new file mode 100644
index 000000000000..3222e9283c78
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+long value;
+
+struct bpf_spin_lock lock __hidden SEC(".data.A");
+struct bpf_res_spin_lock res_lock __hidden SEC(".data.B");
+
+SEC("?tc")
+__failure __msg("point to map value or allocated object")
+int res_spin_lock_arg(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff));
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_cond_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_local_irq_save(&f1);
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock)) {
+		bpf_res_spin_unlock(&res_lock);
+		return 0;
+	}
+	bpf_res_spin_unlock(&elem->lock);
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo_irq(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1, f2;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) {
+		bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+		/* We won't have a unreleased IRQ flag error here. */
+		return 0;
+	}
+	bpf_res_spin_unlock_irqrestore(&elem->lock, &f2);
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1");
+struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2");
+
+SEC("?tc")
+__failure __msg("bpf_res_spin_unlock cannot be out of order")
+int res_spin_lock_ooo_unlock(struct __sk_buff *ctx)
+{
+	if (bpf_res_spin_lock(&lock1))
+		return 0;
+	if (bpf_res_spin_lock(&lock2)) {
+		bpf_res_spin_unlock(&lock1);
+		return 0;
+	}
+	bpf_res_spin_unlock(&lock1);
+	bpf_res_spin_unlock(&lock2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0")
+int res_spin_lock_bad_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((void *)&elem->lock + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset")
+int res_spin_lock_var_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	u64 val = value;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem) {
+		// FIXME: Only inline assembly use in assert macro doesn't emit
+		//	  BTF definition.
+		bpf_throw(0);
+		return 0;
+	}
+	bpf_assert_range(val, 0, 40);
+	bpf_res_spin_lock((void *)&value + val);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_map(struct __sk_buff *ctx)
+{
+	bpf_res_spin_lock((void *)&value + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("local 'kptr' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx)
+{
+	struct { int i; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	bpf_res_spin_lock((void *)p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";