From patchwork Thu Feb 6 10:54:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962806 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DB1D227B95; Thu, 6 Feb 2025 10:54:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839281; cv=none; b=sEmOM+RXG6yEO98I5ugChd79RWD3pSG87PEBEs6sFhljJw6OzfZHwDfPR72RVt0uy2ztbRMkE0CwtN/XzT0r297payGW7j2uJxfvuQbC381yLjxp0OiK6WnOoIaJQ58LbcbHEKjGuyVJQ5XVbDs6JJwAC7qfCNNF+NG8FGVKgqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839281; c=relaxed/simple; bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NPAxdiuZu6K9UN3tNE59awP8CPcNnUFnKVZokZRF8AOFMEOmN8amNMUQq5TK4zhnCMvmyjgZj5fF5t7s8CtGDTgpmewP2i8+VPCzUzrLx723bjGapjKgSEb2S7+qwwaVFa8xCExNoROUbUp2SPQBUaCAtk5a4gcGwNQLWP875Y0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nXz3RrA/; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nXz3RrA/" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43690d4605dso4793825e9.0; Thu, 06 Feb 2025 02:54:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839278; x=1739444078; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=nXz3RrA/8stgP+UsnGvRt4R51unV5VzltJAUGlkXRQXD1aP063mRHP6kMZOgDktG9q VRyu9Z7sV2FNLfsuB/5Ohd24Afu6logdv8r5o1zXoy5xh7cmx4IcaRfD+gkwfFBESqu+ YaiP0Fyuf+CxggEDVijWTNSb6b/LeO8HS0cw35Uu7bUDuEh4ybsldeIHu6Z761RwZtfU UetchwXeBKAs8qf0vIYEzUowNQmFrYMW98r6cz3JV6xjvDYsTv+paFxWMRyvSmXfxfJQ V16jO2rT0+LTqRiWLhgZ5wkHMpeNMcg2V7ufujTLy6ygW+XvRpFPX2efbslYnZJHi31a Hs7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839278; x=1739444078; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=Vk7qJ/4lRRYM1R9qI+FzPnEyddNnjzxg2lsNntJDs3Q+WC3srto9xmrBxPy6vxwhm7 rQxKdxiX/guoh5f5NbFJDA/5urjDHDds3Oh3Jcs+xDXAv8jOxwi4mEHQ1tEEF7yN9qOr j9cGaENIq641UIy8AuY6m8B7dyqQKVyRvwahJBH0wcOSzXyZi5VBULVcLmv0bNKgIW58 BFT9XydLJdTWBKYm4uKz0uyZqT+apNRxMOIs8awS9SYpM8FzNNCR+7CT3LdhkIHCVywv yHQu5qigiRsA7imOrqw4vDxkfXL2jStPcA9OKS3DYwKVsgmiiwsvHHfQXQKyWNDCULYs KvBA== X-Forwarded-Encrypted: i=1; AJvYcCUhT0aYtlpThzNU97MF0U6vLuvJGrcAQsh1IXHTpmBWJQba701f50saHuthFZoQCDCdOBt6o2UU3kyjMyI=@vger.kernel.org X-Gm-Message-State: AOJu0YxdY6o8sRfxgotEZA1al0b8DY8sn3NdKbnNjsWUh7nVYxmbjQnt NQ1bp5wzjM21CzkzQOy6yiQhVTbLeZbLQOUcJdVcQj2HIYaJerBZ2s1tbGk1Xr8= X-Gm-Gg: ASbGncsL1towLfvTsaQykJjh/gAi7EYFF7o6COTxTGPSnKazKpUyi4SfYzr5B5+ocl/ chNs+iZ0IncTRs+eICYuNaFG8k2ABOnJ1+8p+vpI9+Rmilhb2JHj1Uq66P52tKTJmdYq/Z/qDAz chAzl2I2FffX3FHMDeGM5gkAAsT96VhNr5x82Py3AzHk/l8lV1vVT1/MO6S8RhdsWCXHHIJ2Pj4 F7nzGJPHT2SmbCD/LtwcrRXGhfY9XNgbrnsqUVfdIV1VkZaxz1SSmqoETpp0aWapkKUQlYir8Mw 5LJEXw== X-Google-Smtp-Source: AGHT+IHM5vaMe7dQKG64EkTeSRsdLjRkMoUvLfANIG4QYK0OqXKUcNrK28QFEfWzU8iR33/xEot5Og== X-Received: by 2002:a05:6000:184c:b0:38d:b807:b894 with SMTP id ffacd0b85a97d-38db807bb8fmr3106134f8f.18.1738839277867; Thu, 06 Feb 2025 02:54:37 -0800 (PST) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dc050d688sm912424f8f.24.2025.02.06.02.54.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:37 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 01/26] locking: Move MCS struct definition to public header Date: Thu, 6 Feb 2025 02:54:09 -0800 Message-ID: <20250206105435.2159977-2-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject; bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRjghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUYwAKCRBM4MiGSL8RymRCD/ 9/YOhtMVXa4uhP+xTSUCPdwnoJ2e7ZwYDxog71LJ2HXxXg9/eSH8/Si/tA5gU1tWmG5QH+z9074vpJ vFqxdR0gSvIAsqQrBO0Tj1Qgtdyo22PRO5rV3ADONgAsBycwmCaZBA33dyS1teyYYdgun6rsOcUI+f 7pF1YCbr7dccOv8O7agZL2Y/864xfviCiuvbQTK+cwwdqybOxwT1eXhHNeK0iH7aUa6XsePpDTIUo+ Jjk3tHIskRifzXmI0/KTA1h/KRtLc8mbDdNugVNuuQ0Zglv9yZqIcVqy3z2l/OXaAVJlh5g0JO6dlM UFIZc61NIQP5h0n4mDGH9B1u7jcOYk+PiiJHCe89a2orxDwKXXe/wwnz5jKbllMykb9ikA9Ok+hW5d iP0uJ1ZTBE0OFUVJ7kZfX+CbwA9+l0dW6Sk5V88N8siHqhSZvKmyXQLvmnfjHCgGWq9VIkbzgNLFaZ yMEFQeShp0yCBfOTpqybarU2fc79rESPAl91vmwTkbkdGWUDl7ZmqaOlaNcnSsdyJIX4CxqqO0ad7h s2JcoMGOqNOzWMpcA7dkooTjlAydnB+BiivOD4kaYuS141fC0PntHgdN8c2hpVTrG9A64nnLBRMSVN wvOc8dXgBeQceXqmhiBDNuP6995mi4cgcucLwriFXse32y+vjJeqT2hqUBzA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Move the definition of the struct mcs_spinlock from the private mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h asm-generic header, since we will need to reference it from the qspinlock.h header in subsequent commits. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/mcs_spinlock.h | 6 ++++++ kernel/locking/mcs_spinlock.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h index 10cd4ffc6ba2..39c94012b88a 100644 --- a/include/asm-generic/mcs_spinlock.h +++ b/include/asm-generic/mcs_spinlock.h @@ -1,6 +1,12 @@ #ifndef __ASM_MCS_SPINLOCK_H #define __ASM_MCS_SPINLOCK_H +struct mcs_spinlock { + struct mcs_spinlock *next; + int locked; /* 1 if lock acquired */ + int count; /* nesting count, see qspinlock.c */ +}; + /* * Architectures can define their own: * diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 85251d8771d9..16160ca8907f 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -15,12 +15,6 @@ #include -struct mcs_spinlock { - struct mcs_spinlock *next; - int locked; /* 1 if lock acquired */ - int count; /* nesting count, see qspinlock.c */ -}; - #ifndef arch_mcs_spin_lock_contended /* * Using smp_cond_load_acquire() provides the acquire semantics From patchwork Thu Feb 6 10:54:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962808 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1EA7227B87; Thu, 6 Feb 2025 10:54:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839284; cv=none; b=U23WkA0zalzEGbXI6aw4NXTdZ5a6BVZ05OOxo/g56jl+UX+DmPP55zUHIw5vYCA9vKM+HxL/w12pNFOln7SBwinby9JA12K4VK+38C3Cnbuy0zfLJI7QvAZIMDNewYazhkl1zhhR3Nu7xV2LjRKvzZMil9ejmL2ji9nA6eMNkVM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839284; c=relaxed/simple; bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WKqUxPlXO3u71nrRChPSDywIcX+MKSFtsi56OmyENAdvlVJD8B1fHFjLT8ypCD0KV4SoAB7tN/XvtV9D5tls9Wc2JwV1a2V02vZbzF44rDK6yutBJTrsYu67wa5ezkwhdBanntq/2eV0KoX7QtQqzzDRBz/oY5jGMbd1qtFFvDE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=i8aUoMTM; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="i8aUoMTM" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-4363ae65100so8236495e9.0; Thu, 06 Feb 2025 02:54:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839280; x=1739444080; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=i8aUoMTMKRszuPUKf+cRTT3Mk0WCcwI6ZBkM8LMlXWUkPmnq3Z7mVZf1bJGMQ7DGzU gYEINJDCi+gqLVWtMplnGt9SVb/n2/r2Ye8IIK5UOBVAWpuB12yCRiDxCmmqrtB1+Zrz DCeG9xEXcJZAX2fn7u82cnopiDajWJv/FmPdZpwMrDg7/9ZtYSWS0KNqR8VnXumZiLin SJwGP7oQP5E7hK8Ta+6gPccb4srleANU+wAiCex2zt63oT9G3QCcMeh2TVjzBxNZHuno ZGLlkjHz6KQq4xcVZ8pWdc39KN3KAMx39XlL95FjImQzpBrGjBxKeLW6OGGJZhC4cmVy op1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839280; x=1739444080; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=HmhCqKSuQA7I/HkotO15bDh7qASL99cEeAGaO0TdPsufWo9dKEtCJKcsuxETRjR5y4 5M0nmHoH74w9DPY+0SGEsv72OUeUp2wne2Uf8PIyKCDecUMW/IAY/42pKlOmm4rLOEb8 OyeuMx8cl8MDjQBVHWOixqpXLP+jLcVkUDV4Zub4OYo9Nn3jFybk/k6s+th/rgctU5Nh N+oxlbMzLSogMdLiXy89rmEZiJG7hQknVEGrEhPhun5cpLLeXxFSBKXIPf/RrGhURabf bi92TK7OydzdAd3kGmHhDoOYur3kjMSU3B3Z/i4RB8Ujj/OCAFuQg1A9vOa4s3JUhYGy E0Hw== X-Forwarded-Encrypted: i=1; AJvYcCUuDzJSBGlJBl0sbpzdgIWeQ88MzabQPCj0FsgNzzOzMdcmi5D1E02U/egTYxEwju5x/ndA6cwHCb6EeoM=@vger.kernel.org X-Gm-Message-State: AOJu0YwKxoQ26QU6h8gf/NKbMM3tgDWBxZIeuoSc5pXpmsoWhsiTAvgE mBcZURVTu4hLsacYKLZRtLcbsEnJ5knMtKwr451nCC1Ktx7x4+crkjPzuIfcKGA= X-Gm-Gg: ASbGnctUMur35U/pQD48Th95k5thm43GiZ3tL3FmZhB/vnHxpVYumVsv8187JySfppB 6Ak61HogfZ77xuaWjcfJt2td4V8/xw/SjkvG25tVOfFMAHWSkkAhb9bU/cmb1TwddORG9+tXfXs 642MEhDFKd5h6/LPhsId4U/kVuXqG2loBnaYfQAfJMp+LNwPzPFNre2cC990d7nFa1ELunoXlUO N+IOAMJeS8c/R1GndRQpXT0TSGcYsBCW7hX5uqVNpJV8OLGznr0EVRTY7K4n8GjT+yMWYkBds8B 4kM37A== X-Google-Smtp-Source: AGHT+IF4f8Xm3MmGWzddNVFfOdt5/pxOK6cPOqeFWXHyrZxzL6gB3MirVU0RpJy3SYCQ5MPv9d7ldg== X-Received: by 2002:a05:600c:3593:b0:436:18d0:aa6e with SMTP id 5b1f17b1804b1-4390d42f849mr64907365e9.5.1738839279263; Thu, 06 Feb 2025 02:54:39 -0800 (PST) Received: from localhost ([2a03:2880:31ff:20::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd5c87csm1439525f8f.52.2025.02.06.02.54.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:38 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 02/26] locking: Move common qspinlock helpers to a private header Date: Thu, 6 Feb 2025 02:54:10 -0800 Message-ID: <20250206105435.2159977-3-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject; bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkkVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RylRQD/ 9ZckUJptWve6Ivsaj0tCRlXmeXvXakYFfReqoU4TTSiK5e60c9zMXr1PSHiloHlftlQqfuKi7ug6yx A0bUp3hubkSGlTfDSdMPwLh1AN2D0QpPIKuW9AIo5mw5fag+zRMjgwdjsh3o2biZZ4JsC36jA+tvRV OhywUPmmoZL/U7GKGzWmptqXq4iD7oAmPhyHSIyZ80efwrSDnwF8UIzoX8wR8vqgwUPSp3g3TAgdqS Wu+LKgl9hNtbykaP4Jj433O5chD603DDAL+C0COzSBRTNaFTRqqXx3o/3rvGoEDspvDUhm+uXtQJyl AHG62uLsNo3EyObbRiK6pLo/hjsdzLmLnWdfJb2NV9sJPp2VA5FxIiDCkL+P/08d6XoTHUePanJNRI p2OhTqrJFD4fN/JDiNEwhFAcdlvwcrSZV/qDLoUrrD0UYVuywjxrranVRZz6bQyIV9JZv/P8i0TkzE w1DQVyyoWScGq6wi5NLOun+C1IdPnB/k7AISjvxU+vTO7bhPWcKARMgbyFTCw+uBqNq614OwvBHiR1 SktvlKEmTFENUpURQ5Kkx8YOM8Bu1bLOunw6V1hyLl9WU5VH5WgZ2AhT3MLH8H1SixfIzevgMLkfIN BLB2b1rHlEnEgsNDOx3qTtgKlgKaHmaveMw+6M64H8QDFPh3iV10yEj7kN0A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Move qspinlock helper functions that encode, decode tail word, set and clear the pending and locked bits, and other miscellaneous definitions and macros to a private header. To this end, create a qspinlock.h header file in kernel/locking. Subsequent commits will introduce a modified qspinlock slow path function, thus moving shared code to a private header will help minimize unnecessary code duplication. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/qspinlock.c | 193 +---------------------------------- kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 188 deletions(-) create mode 100644 kernel/locking/qspinlock.h diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 7d96bed718e4..af8d122bb649 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -25,8 +25,9 @@ #include /* - * Include queued spinlock statistics code + * Include queued spinlock definitions and statistics code */ +#include "qspinlock.h" #include "qspinlock_stat.h" /* @@ -67,36 +68,6 @@ */ #include "mcs_spinlock.h" -#define MAX_NODES 4 - -/* - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in - * size and four of them will fit nicely in one 64-byte cacheline. For - * pvqspinlock, however, we need more space for extra data. To accommodate - * that, we insert two more long words to pad it up to 32 bytes. IOW, only - * two of them can fit in a cacheline in this case. That is OK as it is rare - * to have more than 2 levels of slowpath nesting in actual use. We don't - * want to penalize pvqspinlocks to optimize for a rare case in native - * qspinlocks. - */ -struct qnode { - struct mcs_spinlock mcs; -#ifdef CONFIG_PARAVIRT_SPINLOCKS - long reserved[2]; -#endif -}; - -/* - * The pending bit spinning loop count. - * This heuristic is used to limit the number of lockword accesses - * made by atomic_cond_read_relaxed when waiting for the lock to - * transition out of the "== _Q_PENDING_VAL" state. We don't spin - * indefinitely because there's no guarantee that we'll make forward - * progress. - */ -#ifndef _Q_PENDING_LOOPS -#define _Q_PENDING_LOOPS 1 -#endif /* * Per-CPU queue node structures; we can never have more than 4 nested @@ -106,161 +77,7 @@ struct qnode { * * PV doubles the storage and uses the second cacheline for PV state. */ -static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]); - -/* - * We must be able to distinguish between no-tail and the tail at 0:0, - * therefore increment the cpu number by one. - */ - -static inline __pure u32 encode_tail(int cpu, int idx) -{ - u32 tail; - - tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; - tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ - - return tail; -} - -static inline __pure struct mcs_spinlock *decode_tail(u32 tail) -{ - int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; - int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; - - return per_cpu_ptr(&qnodes[idx].mcs, cpu); -} - -static inline __pure -struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) -{ - return &((struct qnode *)base + idx)->mcs; -} - -#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) - -#if _Q_PENDING_BITS == 8 -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - WRITE_ONCE(lock->pending, 0); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - * - * Lock stealing is not allowed if this function is used. - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); -} - -/* - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail), which heads an address dependency - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - /* - * We can use relaxed semantics since the caller ensures that the - * MCS node is properly initialized before updating the tail. - */ - return (u32)xchg_relaxed(&lock->tail, - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; -} - -#else /* _Q_PENDING_BITS == 8 */ - -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - atomic_andnot(_Q_PENDING_VAL, &lock->val); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); -} - -/** - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail) - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - u32 old, new; - - old = atomic_read(&lock->val); - do { - new = (old & _Q_LOCKED_PENDING_MASK) | tail; - /* - * We can use relaxed semantics since the caller ensures that - * the MCS node is properly initialized before updating the - * tail. - */ - } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); - - return old; -} -#endif /* _Q_PENDING_BITS == 8 */ - -/** - * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending - * @lock : Pointer to queued spinlock structure - * Return: The previous lock value - * - * *,*,* -> *,1,* - */ -#ifndef queued_fetch_set_pending_acquire -static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) -{ - return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); -} -#endif - -/** - * set_locked - Set the lock bit and own the lock - * @lock: Pointer to queued spinlock structure - * - * *,*,0 -> *,0,1 - */ -static __always_inline void set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); -} - +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); /* * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for @@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * any MCS node. This is not the most elegant solution, but is * simple enough. */ - if (unlikely(idx >= MAX_NODES)) { + if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); while (!queued_spin_trylock(lock)) cpu_relax(); @@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { - prev = decode_tail(old); + prev = decode_tail(old, qnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h new file mode 100644 index 000000000000..d4ceb9490365 --- /dev/null +++ b/kernel/locking/qspinlock.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Queued spinlock defines + * + * This file contains macro definitions and functions shared between different + * qspinlock slow path implementations. + */ +#ifndef __LINUX_QSPINLOCK_H +#define __LINUX_QSPINLOCK_H + +#include +#include +#include +#include + +#define _Q_MAX_NODES 4 + +/* + * The pending bit spinning loop count. + * This heuristic is used to limit the number of lockword accesses + * made by atomic_cond_read_relaxed when waiting for the lock to + * transition out of the "== _Q_PENDING_VAL" state. We don't spin + * indefinitely because there's no guarantee that we'll make forward + * progress. + */ +#ifndef _Q_PENDING_LOOPS +#define _Q_PENDING_LOOPS 1 +#endif + +/* + * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in + * size and four of them will fit nicely in one 64-byte cacheline. For + * pvqspinlock, however, we need more space for extra data. To accommodate + * that, we insert two more long words to pad it up to 32 bytes. IOW, only + * two of them can fit in a cacheline in this case. That is OK as it is rare + * to have more than 2 levels of slowpath nesting in actual use. We don't + * want to penalize pvqspinlocks to optimize for a rare case in native + * qspinlocks. + */ +struct qnode { + struct mcs_spinlock mcs; +#ifdef CONFIG_PARAVIRT_SPINLOCKS + long reserved[2]; +#endif +}; + +/* + * We must be able to distinguish between no-tail and the tail at 0:0, + * therefore increment the cpu number by one. + */ + +static inline __pure u32 encode_tail(int cpu, int idx) +{ + u32 tail; + + tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; + tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ + + return tail; +} + +static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes) +{ + int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; + int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; + + return per_cpu_ptr(&qnodes[idx].mcs, cpu); +} + +static inline __pure +struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) +{ + return &((struct qnode *)base + idx)->mcs; +} + +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + +#if _Q_PENDING_BITS == 8 +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + WRITE_ONCE(lock->pending, 0); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail), which heads an address dependency + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + /* + * We can use relaxed semantics since the caller ensures that the + * MCS node is properly initialized before updating the tail. + */ + return (u32)xchg_relaxed(&lock->tail, + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + atomic_andnot(_Q_PENDING_VAL, &lock->val); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); +} + +/** + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + new = (old & _Q_LOCKED_PENDING_MASK) | tail; + /* + * We can use relaxed semantics since the caller ensures that + * the MCS node is properly initialized before updating the + * tail. + */ + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return old; +} +#endif /* _Q_PENDING_BITS == 8 */ + +/** + * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending + * @lock : Pointer to queued spinlock structure + * Return: The previous lock value + * + * *,*,* -> *,1,* + */ +#ifndef queued_fetch_set_pending_acquire +static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) +{ + return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); +} +#endif + +/** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queued spinlock structure + * + * *,*,0 -> *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); +} + +#endif /* __LINUX_QSPINLOCK_H */ From patchwork Thu Feb 6 10:54:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962809 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E42092288D3; Thu, 6 Feb 2025 10:54:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839284; cv=none; b=M5TbOVQcwMHX1BWL7R1kOPKGmIQI5u0MI3uww6eT5hGy18FylNHb14FKlCuAHdDVjKr3QrfCauM9znx+b92FZFLsgrMOnUfaeqLWojLN6XaYTuWtkFdk7B3i6NWEJwXQb0s5ynI4PK+pXH/Wfml+zMctaV//yyGhxmyK+3ssjcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839284; c=relaxed/simple; bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WbhUv8r2jzKsbygWGjzYox31gOJ6rwwPv6r4+iQZtuVBOsLffa7VKe5QLubEQ9GUoDnxQik65wcZUnMliHIhV+o/9r+UQPrsyn0PLKa9iGuJ9OGa0mowDp0o3fsCC8oetLQ234IWScYu/qUA7J/yhL5Wf14aRjDr68khVg3kOb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ExLgXyzM; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ExLgXyzM" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43618283dedso7147175e9.3; Thu, 06 Feb 2025 02:54:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839281; x=1739444081; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=ExLgXyzMQ3wJtmyTQ3ChOhzAAOyDLmcgPnS3WhOqqdWSXUCGFGTRo04vhiwW3hQB7V Xh+3RkprZZmmuXtC8GYwcS9Srz6h16Mx19lymY0nN9LuAQHqgxqdPh4ECMBmmRmNchYF /WB61kCz4eGTDLuJWT1T9yUxY1DWKV+PdPOIvWOHqY8R/sLIb+KJhqJwJ9V6y8dksyQp DAFxt0AxZTtekA+Lq6w5u8ue0YNQMIqn5B6l7qRt7niG+hjnimWJgaPzc20B3f+P2RIU 8AlqZ89NgfGHmGRABK0HYnTvp+KUhxHNMoL3j1HpJ92EyDqGr0iC0sql38HXwGLH+faW D2Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839281; x=1739444081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=CAi0ayEmAuAzMysbobT3eqcBPAtIWkG+NiRqonbqqhpcYAe847xxs1qk3cx8N9ORKC OE5MgDJsgqs6erkQWXqmpSgOShLmQWMz5FKQFFS2Un9naee8uINUMM/IN2c0hHrjIM97 p3yLYZ9cPEqSZ3HyWr7DCb0rXl9y5X7gqMy4Z4xllHfkOJfzCmsbByHyQa97GljYohSZ TxHXV3Xq4nraiQ/nE/Zkyw7LNwB/q0K/zTtDgDu9oG1JyBrrsJkRLePkLC8FlSuDjo/B 8FcGcqQElKW8TXdKRoaFljyVT7SXruG8G35AIvnhLE53dQTHIykK6k4S9FNrDpJseIYM io0Q== X-Forwarded-Encrypted: i=1; AJvYcCUdmPVf1wTZmETKeGXLccQldUkbJW0cCyGJi8RWl/b5/+jFybcShZlSPGH+yd3ok9RVQmZr7lvU1XmFEFI=@vger.kernel.org X-Gm-Message-State: AOJu0YwcXQTapmcczyNg34+gQ9CGhTHsMIBxgm/18kxBtnKjR0VB26ZS jWw/lZWL17fRuoU0td8/fTsQJHi8k9aJKDcXmy9G9fE9s3glOgS4eBXPzc680hY= X-Gm-Gg: ASbGncvw3KyZRUEZF+4CHEpVmNMDwy14aWq0FcTrOO6HhGM1RYxeV6mcpsCv3XkrqRw RLCIK1wpQdv+nLwyxwlkcGhtiwP0S4ulzgeGKh6znxCaWI3JIqXfLWtss6AA2bTzN2sNDpzyoEs KhyXiBpBMufbqDyeS/ZZF4bsddy1q5J3cd0kSCPoTTGLCoJoxe5Kob4OBmAlAWhw8DPQuiLrP+Y XwPHSUif9VRVKc0ILfxhlAUg+dfyW39FlUWV01jRueik3WfJ/CUG6cJRI2D0STXRwlca+V0pezG MsPpBQ== X-Google-Smtp-Source: AGHT+IHeU2y2+7O/uQKeXcLLdS77pnq5NJz5kvWtZprZ6uplDLOI4uMkvBsICSMk+DCJ+KCjzZMlhA== X-Received: by 2002:a05:600c:5114:b0:435:32e:8270 with SMTP id 5b1f17b1804b1-4390d43d8ecmr61035955e9.14.1738839280667; Thu, 06 Feb 2025 02:54:40 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1b::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde0fd23sm1381770f8f.71.2025.02.06.02.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:40 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 03/26] locking: Allow obtaining result of arch_mcs_spin_lock_contended Date: Thu, 6 Feb 2025 02:54:11 -0800 Message-ID: <20250206105435.2159977-4-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject; bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RypWCD/ 9a6wXeJpwHCx6Ry/+60+90gq49Vx2iM0Ni2K5/JZ/w1SQ1K45M6aFIB7872eGf0XNmEL8eeYRcg6ky oIbxfK6Dz0h4c4cjezC6IO7ADwiRf1nt9gfi3HEwqdWIzBMac9NGH1ITZzTSB9FZuAkqXAfpZZq0iW l9L7JT2bKDalL+4TjYIHnpQaWg+MzdyiUwK9daJqckornGM5bI4UFqkTtWU74gzcj9I8Ww9UAJmZll /0oSi0NI558EvQj/JozkqtizlLGWLPs8WdwyXca/O84yRKYAhzPQZ+Ebe9SP1IcoorQstTzGtBQLkN zeZqTP0CpGFfPZy9B3ND9iBY7eJQQxB2h2yT74z85HkKKEjc2v0asUqBT4vT+B5nCPzdr638w7FtGe emuAf5iymDVjVyVtnlOV9UV5465nfpUZv9DqAwksMUOVSc3qvTgbKcDN4XsiSIJPYVu+XDqiFZqx3Y WlVVJyTFe+L+QnTVi8xn2N/fICy+rFILllPSBpOso5O/9KXfpcu9vnaCtViVPakzRGJqLFeOjxAQNd hQxMdvz26zEoWX77OknnGmg7FcT6xSJ9eFK4uTMlDQbiikiPsGIMVPxVutFarPkQs5maZ7giuy1hnE iy/9siJLZmE3DRYLv0bOoj+wxlb6+yk1cnr9VBCzo9XnhvUEn57+U1QY9opw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net To support upcoming changes that require inspecting the return value once the conditional waiting loop in arch_mcs_spin_lock_contended terminates, modify the macro to preserve the result of smp_cond_load_acquire. This enables checking the return value as needed, which will help disambiguate the MCS node’s locked state in future patches. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/mcs_spinlock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 16160ca8907f..5c92ba199b90 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -24,9 +24,7 @@ * spinning, and smp_cond_load_acquire() provides that behavior. */ #define arch_mcs_spin_lock_contended(l) \ -do { \ - smp_cond_load_acquire(l, VAL); \ -} while (0) + smp_cond_load_acquire(l, VAL) #endif #ifndef arch_mcs_spin_unlock_contended From patchwork Thu Feb 6 10:54:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962811 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 947CA22B587; Thu, 6 Feb 2025 10:54:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839288; cv=none; b=hGh+bohaqHB4vBPEIL0gSFjrKpBMR3wOY0tZbmiJHxyRFA4TPPnKmZrS9LdjNh09soyTSX8zWrXxl4UUc0KVcVWxHwP5qyhiebaG5QoysyrQyz4uei5932/4rj4Gz+hJawgHp3+WEkBELWn4uH2fXVpAhSzjCFWULV1Ga5BBs10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839288; c=relaxed/simple; bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CJ9lN9bmEtGzXQpGNEzbSyIUBPBQNM5Qsm4DUSB60q2EJVzKEvjyqe5rMheznfNm8/IQg1nhl9Mzl7a5G6N5w4tjGKa7IC5xZkYKwTLTP1iscRBOEgVnKl5wgAdac3aZlCDrQwia9vkFkYQuBdVX4vAVNKPeWCYYtYAWrAuADmI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ccgkuicv; arc=none smtp.client-ip=209.85.221.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ccgkuicv" Received: by mail-wr1-f65.google.com with SMTP id ffacd0b85a97d-38dae70f5d9so334883f8f.1; Thu, 06 Feb 2025 02:54:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839282; x=1739444082; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=; b=ccgkuicvgPPG6B754LrUSb4srrLEKv5E3GqvNBHiue3dc6lF8JvDdH8SlOw8p9mOIQ 6HOAdNixT7ii5FITn2YXHiXqENzc10IzOcnMQIvGtfeR+q2EtKL33vGuYSMNcTAvlEjg sk5MEjKWfCXIZucZhulub58qBFoSSDIINVxYvajeim6hruAcWc1YledLFIouHtiA7RTw 5RlIWcAeI6wHRt4CcJeIHRu+es6WIHPAlVmtP/6vivkzqhYzShBWVPAKCZuo+9u8yONT us3qct42lFgrCyeoDz5Q6dlDv97Jx4xOUAUyKMvRf96K/C1q4l4f552+X1A8WrBDjBlb mJ4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839282; x=1739444082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/G/O/Zyw1wKL+61ZIf71iZUhSIiLpUYSgFpa1qeafC4=; b=J0Zs++TwDBYWIyz2qOJOB5X55akuka+q5SSWaMHuZTgKKh0a7aDe+xDDgI463LQrFk jeSxKysZRGRs1rlOWMRVVa0cim9zCWONlZnpRWi5FGtk4LQAGAaiqEYEgZjpqL5RA93T pxpp6oZ5yOldkFzpezfzcOg3zgRJzXVHrbkPlAJ5i4UncpIjgN8GsO3IxGLg5yDo5alH qarBoKH2v6PGw8Sa5VHpXyTZq7sBhFBSdkU7odkdjmvMh5cUowISjE35qKQlsSYkGX8u MTBxPkxyWMEXWi9GRss+prqXq9xo1yDGIddne0H9/J4He9/7ce/Dcs6/jPHeXEHUzHu9 SghA== X-Forwarded-Encrypted: i=1; AJvYcCWMUs7veu/JFhdPMbGwLeGAoYDi2/0MG+W04ni+jXY3XM+OGfH+GcO9Cby+eYEp7CHaBxi9lEWp7dPaRXI=@vger.kernel.org X-Gm-Message-State: AOJu0YyDAbnCMQXTmgrUXyI7NVYP1FXETnEw3qTXZo+dOxPKc2WK2bTW Nj7eQEcZbOM8XG4f+FMktKF6YCcAB2Das8F3IIV8hSCaKxjDBFPfpCTe1LJUkw4= X-Gm-Gg: ASbGncspmsCIK16ZsCSogJCuFfl7JIxnilPC5vXNHsKMUSP8h7jiMlbmaInoUiDWuvX 6tpprv3d5N3pNHLZJVtemCv3OFFm0KAoaMxreuFFmXv1wcG0j+Jo9gaWBEWV6Oy1EXMzXnaRjqr OzNYNdyYVfqSiS3Jo+j+raQFPI+9D6Q/BuvqrfxWcB2F2aJ+906f7iam+EI2xBcoE0coStHgQU9 s/DDN2IKW01hCbwHcC7GJoop6EoCblXVITYqu8hUCp1jn5Y8eKXJScLFdNw3jT2PBdBpT+QEoPc lkJw1w== X-Google-Smtp-Source: AGHT+IED4qd4F+YsOBRJnZkqjdZ0qCWlhz5W6kmheG8y6eyBeQtVsfltvea7zK4aPaLbItj8zaxpkQ== X-Received: by 2002:a5d:5850:0:b0:385:f1f2:13ee with SMTP id ffacd0b85a97d-38db490f8b3mr4373779f8f.46.1738839282225; Thu, 06 Feb 2025 02:54:42 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1a::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde1ddb9sm1382899f8f.84.2025.02.06.02.54.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:41 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 04/26] locking: Copy out qspinlock.c to rqspinlock.c Date: Thu, 6 Feb 2025 02:54:12 -0800 Message-ID: <20250206105435.2159977-5-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=14101; h=from:subject; bh=FYMfmdHhZVrUK2nS5jUp23jzoKxyU5CBSvGylajqNkU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkgODNr0FrQcgKvek25PD/Kof6Fg8QSVqhrY4b ll0Vr4eJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RyjcyEA CWkEv1hNRGlRz7ngHfRjKSfdxkoW+Jesqf9WJHO5raMEArHLDtxB4dMXEW9mPu0R2nbppStFQt+zh4 Emq++RkywjOxaooDM0K4Iurr6vOdSmpHfMiuqB6mZ4kHQ8fLbplR9vkDxA89qbEo0xPKh8RkEMtnwB vQKt8yT7Ubg4GV2eG76RGhBn9nPueBqpO/5X++hWy32N74C5r0qYYBhROpXLqY/R5Rpn4lV8TqhdRk rdiQNpdPA716MET8PP5iDJ+o22hHBxGRZLsv6TcXimxObYAneQ3XcPzdZNeQ0Lu1lmUL0ItvP6uVj/ Txl2TNgwgBgAu0HO1D5mIlX0DjWhlO/uH28Av4PWzHG8tTbA/gu81NyPa2GLFla0L8uiluz1HnLnSg b5MHVmXaAlvzACli+/WGtI5LeXTHhpKwels4eBwH2wCTHh5PyhPAgMvnCwaX0iRES/oSEYMV80tDGI oXrop0LJRkfimay6TKAOynHmk8HjO2vvkVuZe5Uc48ycPaC6eVxVz/k1IUcuKo4Ucf2q1/9PKIYU18 TYQ60j+i23PPwphC6ianMx4TGKm7EF1yZKUpEw/Vu9biDK7WpNF4KKzhD74yMfEucCO5aVnYjoaNTj 5RlEy21xIyGTRCEgZQJDOJPB36mKO3NQmcDl4clT647X86fuZjC/mHeKcouA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net In preparation for introducing a new lock implementation, Resilient Queued Spin Lock, or rqspinlock, we first begin our modifications by using the existing qspinlock.c code as the base. Simply copy the code to a new file and rename functions and variables from 'queued' to 'resilient_queued'. This helps each subsequent commit in clearly showing how and where the code is being changed. The only change after a literal copy in this commit is renaming the functions where necessary. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 kernel/locking/rqspinlock.c diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c new file mode 100644 index 000000000000..caaa7c9bbc79 --- /dev/null +++ b/kernel/locking/rqspinlock.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * (C) Copyright 2013-2014,2018 Red Hat, Inc. + * (C) Copyright 2015 Intel Corp. + * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + * Peter Zijlstra + */ + +#ifndef _GEN_PV_LOCK_SLOWPATH + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Include queued spinlock definitions and statistics code + */ +#include "qspinlock.h" +#include "qspinlock_stat.h" + +/* + * The basic principle of a queue-based spinlock can best be understood + * by studying a classic queue-based spinlock implementation called the + * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable + * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and + * Scott") is available at + * + * https://bugzilla.kernel.org/show_bug.cgi?id=206115 + * + * This queued spinlock implementation is based on the MCS lock, however to + * make it fit the 4 bytes we assume spinlock_t to be, and preserve its + * existing API, we must modify it somehow. + * + * In particular; where the traditional MCS lock consists of a tail pointer + * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to + * unlock the next pending (next->locked), we compress both these: {tail, + * next->locked} into a single u32 value. + * + * Since a spinlock disables recursion of its own context and there is a limit + * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there + * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now + * we can encode the tail by combining the 2-bit nesting level with the cpu + * number. With one byte for the lock value and 3 bytes for the tail, only a + * 32-bit word is now needed. Even though we only need 1 bit for the lock, + * we extend it to a full byte to achieve better performance for architectures + * that support atomic byte write. + * + * We also change the first spinner to spin on the lock bit instead of its + * node; whereby avoiding the need to carry a node from lock to unlock, and + * preserving existing lock API. This also makes the unlock code simpler and + * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * + */ + +#include "mcs_spinlock.h" + +/* + * Per-CPU queue node structures; we can never have more than 4 nested + * contexts: task, softirq, hardirq, nmi. + * + * Exactly fits one 64-byte cacheline on a 64-bit architecture. + * + * PV doubles the storage and uses the second cacheline for PV state. + */ +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); + +/* + * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs + * for all the PV callbacks. + */ + +static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node, + struct mcs_spinlock *prev) { } +static __always_inline void __pv_kick_node(struct qspinlock *lock, + struct mcs_spinlock *node) { } +static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, + struct mcs_spinlock *node) + { return 0; } + +#define pv_enabled() false + +#define pv_init_node __pv_init_node +#define pv_wait_node __pv_wait_node +#define pv_kick_node __pv_kick_node +#define pv_wait_head_or_lock __pv_wait_head_or_lock + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath +#endif + +#endif /* _GEN_PV_LOCK_SLOWPATH */ + +/** + * resilient_queued_spin_lock_slowpath - acquire the queued spinlock + * @lock: Pointer to queued spinlock structure + * @val: Current value of the queued spinlock 32-bit word + * + * (queue tail, pending bit, lock value) + * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : + * queue : ^--' : + */ +void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + struct mcs_spinlock *prev, *next, *node; + u32 old, tail; + int idx; + + BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + + if (pv_enabled()) + goto pv_queue; + + if (virt_spin_lock(lock)) + return; + + /* + * Wait for in-progress pending->locked hand-overs with a bounded + * number of spins so that we guarantee forward progress. + * + * 0,1,0 -> 0,0,1 + */ + if (val == _Q_PENDING_VAL) { + int cnt = _Q_PENDING_LOOPS; + val = atomic_cond_read_relaxed(&lock->val, + (VAL != _Q_PENDING_VAL) || !cnt--); + } + + /* + * If we observe any contention; queue. + */ + if (val & ~_Q_LOCKED_MASK) + goto queue; + + /* + * trylock || pending + * + * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock + */ + val = queued_fetch_set_pending_acquire(lock); + + /* + * If we observe contention, there is a concurrent locker. + * + * Undo and queue; our setting of PENDING might have made the + * n,0,0 -> 0,0,0 transition fail and it will now be waiting + * on @next to become !NULL. + */ + if (unlikely(val & ~_Q_LOCKED_MASK)) { + + /* Undo PENDING if we set it. */ + if (!(val & _Q_PENDING_MASK)) + clear_pending(lock); + + goto queue; + } + + /* + * We're pending, wait for the owner to go away. + * + * 0,1,1 -> *,1,0 + * + * this wait loop must be a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because not all + * clear_pending_set_locked() implementations imply full + * barriers. + */ + if (val & _Q_LOCKED_MASK) + smp_cond_load_acquire(&lock->locked, !VAL); + + /* + * take ownership and clear the pending bit. + * + * 0,1,0 -> 0,0,1 + */ + clear_pending_set_locked(lock); + lockevent_inc(lock_pending); + return; + + /* + * End of pending bit optimistic spinning and beginning of MCS + * queuing. + */ +queue: + lockevent_inc(lock_slowpath); +pv_queue: + node = this_cpu_ptr(&qnodes[0].mcs); + idx = node->count++; + tail = encode_tail(smp_processor_id(), idx); + + trace_contention_begin(lock, LCB_F_SPIN); + + /* + * 4 nodes are allocated based on the assumption that there will + * not be nested NMIs taking spinlocks. That may not be true in + * some architectures even though the chance of needing more than + * 4 nodes will still be extremely unlikely. When that happens, + * we fall back to spinning on the lock directly without using + * any MCS node. This is not the most elegant solution, but is + * simple enough. + */ + if (unlikely(idx >= _Q_MAX_NODES)) { + lockevent_inc(lock_no_node); + while (!queued_spin_trylock(lock)) + cpu_relax(); + goto release; + } + + node = grab_mcs_node(node, idx); + + /* + * Keep counts of non-zero index values: + */ + lockevent_cond_inc(lock_use_node2 + idx - 1, idx); + + /* + * Ensure that we increment the head node->count before initialising + * the actual node. If the compiler is kind enough to reorder these + * stores, then an IRQ could overwrite our assignments. + */ + barrier(); + + node->locked = 0; + node->next = NULL; + pv_init_node(node); + + /* + * We touched a (possibly) cold cacheline in the per-cpu queue node; + * attempt the trylock once more in the hope someone let go while we + * weren't watching. + */ + if (queued_spin_trylock(lock)) + goto release; + + /* + * Ensure that the initialisation of @node is complete before we + * publish the updated tail via xchg_tail() and potentially link + * @node into the waitqueue via WRITE_ONCE(prev->next, node) below. + */ + smp_wmb(); + + /* + * Publish the updated tail. + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * + * p,*,* -> n,*,* + */ + old = xchg_tail(lock, tail); + next = NULL; + + /* + * if there was a previous node; link it and wait until reaching the + * head of the waitqueue. + */ + if (old & _Q_TAIL_MASK) { + prev = decode_tail(old, qnodes); + + /* Link @node into the waitqueue. */ + WRITE_ONCE(prev->next, node); + + pv_wait_node(node, prev); + arch_mcs_spin_lock_contended(&node->locked); + + /* + * While waiting for the MCS lock, the next pointer may have + * been set by another lock waiter. We optimistically load + * the next pointer & prefetch the cacheline for writing + * to reduce latency in the upcoming MCS unlock operation. + */ + next = READ_ONCE(node->next); + if (next) + prefetchw(next); + } + + /* + * we're at the head of the waitqueue, wait for the owner & pending to + * go away. + * + * *,x,y -> *,0,0 + * + * this wait loop must use a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because the set_locked() function below + * does not imply a full barrier. + * + * The PV pv_wait_head_or_lock function, if active, will acquire + * the lock and return a non-zero value. So we have to skip the + * atomic_cond_read_acquire() call. As the next PV queue head hasn't + * been designated yet, there is no way for the locked value to become + * _Q_SLOW_VAL. So both the set_locked() and the + * atomic_cmpxchg_relaxed() calls will be safe. + * + * If PV isn't active, 0 will be returned instead. + * + */ + if ((val = pv_wait_head_or_lock(lock, node))) + goto locked; + + val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + +locked: + /* + * claim the lock: + * + * n,0,0 -> 0,0,1 : lock, uncontended + * *,*,0 -> *,*,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail) + * and nobody is pending, clear the tail code and grab the lock. + * Otherwise, we only need to grab the lock. + */ + + /* + * In the PV case we might already have _Q_LOCKED_VAL set, because + * of lock stealing; therefore we must also allow: + * + * n,0,1 -> 0,0,1 + * + * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the + * above wait condition, therefore any concurrent setting of + * PENDING will make the uncontended transition fail. + */ + if ((val & _Q_TAIL_MASK) == tail) { + if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL)) + goto release; /* No contention */ + } + + /* + * Either somebody is queued behind us or _Q_PENDING_VAL got set + * which will then detect the remaining tail and queue behind us + * ensuring we'll see a @next. + */ + set_locked(lock); + + /* + * contended path; wait for next if not observed yet, release. + */ + if (!next) + next = smp_cond_load_relaxed(&node->next, (VAL)); + + arch_mcs_spin_unlock_contended(&next->locked); + pv_kick_node(lock, next); + +release: + trace_contention_end(lock, 0); + + /* + * release the node + */ + __this_cpu_dec(qnodes[0].mcs.count); +} +EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +/* + * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). + */ +#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) +#define _GEN_PV_LOCK_SLOWPATH + +#undef pv_enabled +#define pv_enabled() true + +#undef pv_init_node +#undef pv_wait_node +#undef pv_kick_node +#undef pv_wait_head_or_lock + +#undef resilient_queued_spin_lock_slowpath +#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath + +#include "qspinlock_paravirt.h" +#include "rqspinlock.c" + +bool nopvspin; +static __init int parse_nopvspin(char *arg) +{ + nopvspin = true; + return 0; +} +early_param("nopvspin", parse_nopvspin); +#endif From patchwork Thu Feb 6 10:54:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962810 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB62922B5A1; Thu, 6 Feb 2025 10:54:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839287; cv=none; b=eFPcQS+MTuLUNFGRy+ex+C6ZoCkHQicxrAQjps0E+9+/LJ2pHM1GSf6I9zGYbhwGSJrG0UFIB+R6O+EGNQ08O02dmKOKZflZVGxlig9N2aVMWY1uz2eD+OHoGeGqs2Jakxa5xJR3XbKRPqZ9lo8igxu64tifltT028uts4a8RNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839287; c=relaxed/simple; bh=AVuKKIzjLnum/Q8Jp3BllOKRS5yuVDUgY1U4pKev08I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t1c2adQ4yiI4nBEZHB5N+Se4FILwXXpw9N6iES/lhYJmXb59PI5WR8LrmLBB/UHyzNDbkNH20mfwepgIZJKXM2kYQnIESupkcl0+/Ytr/1Gqm0vc+aOvnQWk5dPZdgjLVgJnf1IoK36LtiiXMqR5xOGFtYRPPTY1pphKkFdmlmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WSuA37w3; arc=none smtp.client-ip=209.85.221.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WSuA37w3" Received: by mail-wr1-f65.google.com with SMTP id ffacd0b85a97d-38a25d4b9d4so349100f8f.0; Thu, 06 Feb 2025 02:54:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839283; x=1739444083; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=joZkmdV/dAzLK1UShpkitp+TG0brsNHWZrt+dWRUiK4=; b=WSuA37w3iQ+yB/rwMCM+67sVm6lGfW45a51GseFHpNXoOKH+Jo0iHT4g7IAlNbePQY 5nw2WQ6egFILejeK6r/gfw+4LBSxXlOTkawXRm0fBsIlfRumKtRjGPEu6YhCAtC0mz5C mTg2jpQ2Mtm1oPZSRaKGHIajFHdzK9yjVoo3lc/DE9Y5P7nDLd2VL3wxJOBesACZC3K5 dXVeJvymHqddqAsoEfkVlpcLIPLBjLmkVV8lFnlmPCfanpvPkL8Dnju6IUArwQiFr5EA SoswNGCtMaF2emC6/h6hYUXnCyb5jIrAnZERKcC8h2jLB2vIRFi4ODL1Xb6uJu8wtD4m wphQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839283; x=1739444083; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=joZkmdV/dAzLK1UShpkitp+TG0brsNHWZrt+dWRUiK4=; b=Rx5rGPDBpKDuHZU0oMxodgw++mI7/b167yVTx+r7bZkQ8AHq0n9ooNrqbOWj89PvBt oxVIafCs+EdrAJji5tqfVGoFr0qF3xHWcM5da+z7nRut8Tq0oe4wHmKaCAtdVIWAFPHE SReBlY7pG7HOAPQFdFl8VcbbPkkY76yixtE6T9e3VV5KIVaebZ3y6I75846Iz5sJ107h S6PfdaV9yf/XdX9taWrR+e5VWwzvjnseowlJs4Li2JG2zkRXh4kSYaakJj/QXr5hX5WY vxjDr/48U/4gzSAoKx+iJY9SnOKaxruD6EnZTRWuB/DYxEwLjsXs6n8lsdNM+rwmC5X8 WNFw== X-Forwarded-Encrypted: i=1; AJvYcCVVrNQc/NYRgD5BuyVtQxKsjGYJRjRP4mZVHlywPHdl/ZgdIlycgqiMO2/97V/CCr8K3wdtQgH7UcjnNrc=@vger.kernel.org X-Gm-Message-State: AOJu0YwM5h/Q6r2gVqgTBU3FCMKtHm7UgF4d4qijOMlV8XkogGhAGflG cQn55cOOtAq/pSIyvWs3rcPVPkmVlAu9mTYLvXl+5BhJ2qhJlNrMDvq0UL1ns8g= X-Gm-Gg: ASbGncutHC0avuJEaSpzQDw9WESpi/m2SWCpqtUCyEibuMUWwotuMN6xOLWBPT3V18h PB/7rQv/Z8SDKS5TxzdENkMrpPFicg2MKunqOFO3CCDkpsEeUsdhsOqGpt4UT/AJTYki6EF5T+7 L81snTGeu8naqYbn8sb5Bo8npLL8duyIakfJFpODrtS3P95j2yIxWy70G8DnIQzSCdb+QLadUO7 AGGUJVFEnNNaEgVng5XN0PCL9GOdFw9v8uKDJ/0MXK8x0YqUHw6P1ejjq2TWP2QT2kerFfXOkbV p8Z7vA== X-Google-Smtp-Source: AGHT+IFy0RXuexUHa3kfjyL19mkyb1nAWPyntJqslgT3n34npIvK+M2FW7T6Phh8AkNO/tuBPU7EAg== X-Received: by 2002:a05:6000:1886:b0:385:db39:2cf with SMTP id ffacd0b85a97d-38db48b9e8emr4235856f8f.12.1738839283504; Thu, 06 Feb 2025 02:54:43 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1d::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd35e9esm1390719f8f.25.2025.02.06.02.54.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:42 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 05/26] rqspinlock: Add rqspinlock.h header Date: Thu, 6 Feb 2025 02:54:13 -0800 Message-ID: <20250206105435.2159977-6-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2297; h=from:subject; bh=AVuKKIzjLnum/Q8Jp3BllOKRS5yuVDUgY1U4pKev08I=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkAUSLUoqJ61KHJqVww5VkHo5XuLe/f7xDpkMh vShNwMCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RyoE9EA ChWskn+IPkSTn/DMBwsfvx6EWqIgzdLCOdPnNCPTWMG8F+qUgNrWq+hnY2AEcF0ceG8KE69wHJ0cLI VBN9DG6qMLro8x1voPjPdK+aL1HAKDexxzJsUGqXzwcJ7tvBu+6aJto1dqMnI/iAjErkVXhTfeeDnQ HE8Vc8ft+nREngcb+k2A3S+vnI7cNgZyEtUXn3mX+4C1e0siL9PCKWRbCj7Dd8CIyhr+I6iVZdwGNy fYfSZBupBuq5C4yvcYeJ8shIaV3bkO04VogVyYLEi/QVP8pKyI8R+EBtKWqigsXI6383C8YLtrbQGd 6GU/QIqvFWawk/4RAnRUbHrfbE0pEX9aYKhgNEqv6/kqUCtYlJg+UsJtP63tthsVPkwsBYSzNyoHbB 0DutHelI8e77l5BQjjBxlxrXXTwKwflGFaPet1F3lFUu5Cq2PuJNvOGpz9xnFAi/8jxZY9QaxMKAUg Xr5iqvWwjk83bjYdoRMl/jswMGFWaFmpSrDUvrXM43JJVq/Jip610cxzOu92F+nVn1lWFbMEC4Bpva mhfxqax6cnPG6vhnjpN4uowxGnWFTJUwdYsICJWIXeHdbiXs3WJOtEvcwisb0n3r1F6T8NtLM+7Mdu KrBmyKQ6ZIowge2wnwHfNovFg8fEt4LhwMndrwjiFM78K91D7w+iuxRUPVLw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net This header contains the public declarations usable in the rest of the kernel for rqspinlock. Let's also type alias qspinlock to rqspinlock_t to ensure consistent use of the new lock type. We want to remove dependence on the qspinlock type in later patches as we need to provide a test-and-set fallback, hence begin abstracting away from now onwards. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++ kernel/locking/rqspinlock.c | 3 ++- 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 include/asm-generic/rqspinlock.h diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h new file mode 100644 index 000000000000..54860b519571 --- /dev/null +++ b/include/asm-generic/rqspinlock.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __ASM_GENERIC_RQSPINLOCK_H +#define __ASM_GENERIC_RQSPINLOCK_H + +#include + +struct qspinlock; +typedef struct qspinlock rqspinlock_t; + +extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); + +#endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index caaa7c9bbc79..18eb9ef3e908 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -23,6 +23,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -127,7 +128,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; u32 old, tail; From patchwork Thu Feb 6 10:54:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962812 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64CFA229B0F; Thu, 6 Feb 2025 10:54:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839289; cv=none; b=W6DDvZ5umehLa+U8DX2FwS/4HGQZhNfFTFkSAeGC8vs+kYGjZMsgsQ4hmk5GqR5kwDhcHxeT4IW4mUALRkpkWLCcURc85/R+W13advrbuRbALLEItZ4fBa0rPmlOcWT9+k5SC/YPFOhxQ4qIfXv4fpxjKrQgHFhbjviFuvefQD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839289; c=relaxed/simple; bh=r7k8qGdKN/3/qwxHoOfh+ZQmucFzyerAvWUxzRBa3Nw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lGcDIVTCZaXmMWzH+ID5pC0KbLrZKX3jdC4tRheFiPakFGyCa9z76JM3+HY6BXqZLoGpVpxuEm1pNUPKYHazrQbyUosubtGVMzGtqndbyFkQIh4yLziLd06DZ35SVQ+tKL4/3rg/FamRow2m9ekgO0SZjhoEKwSo/QGUS4w7DC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gEwP2l9P; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gEwP2l9P" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-4361815b96cso4768575e9.1; Thu, 06 Feb 2025 02:54:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839285; x=1739444085; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FML/jIvPcAoxlA62EsuMUSBptbx5cKUmte0tA2musTk=; b=gEwP2l9P7GKiGA5//AffMt8z4ionJ+pnOF6hktdb2ho5yxXB9qGNAJmDQjvczn8lhT Foqogue+a5VcVSaOaUICnlpQ4kMguNR4akiW6KPf2vvsss6q6fLgVFBBMF3ViFdqp4VW r5qZ89k5TcQA36NmYpJgpbAw6w7TH5yi91e3N6quI+VpOao6jYOXYHyJBui17ageYGkE FJiw43Et/m+LCEkoDIraRMcQzYvq+FgALM8smkry5/+OXyHodunWcQqJqMRHy76Z4Gcx jwc1otatVcnLTR36Osw6c9ew3XLTG6IgJu660eHnm9bkE4kuYK0FMurrQCscv0osfbwe LZJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839285; x=1739444085; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FML/jIvPcAoxlA62EsuMUSBptbx5cKUmte0tA2musTk=; b=XKxtVDd/1STs9UlpvP4CYUtglFsyp5Dc5SSn8LCNFp4ii0TsYZBURqMoUntk1RsWpY TybMGeyiYMjPU9zUJiIF+F8bfWhCFtodDvdU2T3tfWIHxUZ3pKxWrV1BtHzb66d9YuaS FWyPIhG03nAwmcqNL0ggVSAuexSGzGsHtvdTkGuMU4ulwyVponjoQ673ENbNE5q9hsJb 44FM3/V3GCaHG0Co3MNSxKrDYjDjISptSpv9T9eqqzxHb1EleQeSLGQ3q4yYJQ8rOCHo b/QGYbnj09CeyfHPRrowkUlPqkYfwknegEP4qWEiU1oj+Km3BGh4h3bazhUtJ2/rHz/W XAdA== X-Forwarded-Encrypted: i=1; AJvYcCWxSprz6xZQiXlSqFsB0eQjbeH5m37G9XAUDWvarOD62fCtFMoJk3ZbmhsYSRT1XesE/7HrU76n5CPKCoI=@vger.kernel.org X-Gm-Message-State: AOJu0Ywe4rB6TBF084XVTG3m3cbJqQONe9OjiZBgHpPGWtUBtliKU4zv FN4oQ6yYVBEGkiwf5rP6jD7A/KAsOhamSwiTi8knT1kcf1qVLNvOmnXRbsXXdE8= X-Gm-Gg: ASbGncu4qvsQDj1niVmjqKvcRSlO2DK37kffuXkMPEswTr2YhqtqX2lbBZbGZiYuZGn aNhvKluyLnrtoLKEaDNsBo4A3NRPAG/68JYkPI0kN+rbUGsCd/5JEJERiVF/h69CMppRSHG1BUy +dUmld2DC6siAzqb6YAmRifIYGUGOJ5K7mAyW9FJAtBhxokiAGse5/43cmrKerezRfVDfO3Kkr/ 3C+do0WlbDaFEFLi+j4dE2q71jGB00D3Q+1Zf1KugRAv41rxXc5w3D3MUYsNkMFE29s/EneGyF8 XODrgA== X-Google-Smtp-Source: AGHT+IEOmTjECXHJOHu9cs/SJ+6ticeS7+QaeLa0eVa8SFVT2qBWkqUoeVUR/OVwzkrcdhv+Xm5HzQ== X-Received: by 2002:a05:600c:1c90:b0:434:f270:a513 with SMTP id 5b1f17b1804b1-4390d56e3admr51732625e9.29.1738839284871; Thu, 06 Feb 2025 02:54:44 -0800 (PST) Received: from localhost ([2a03:2880:31ff:17::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43907f6741esm44971815e9.3.2025.02.06.02.54.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:44 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 06/26] rqspinlock: Drop PV and virtualization support Date: Thu, 6 Feb 2025 02:54:14 -0800 Message-ID: <20250206105435.2159977-7-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6325; h=from:subject; bh=r7k8qGdKN/3/qwxHoOfh+ZQmucFzyerAvWUxzRBa3Nw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRkILe/Rfrs9nCkS/jAFdrxKRcYv7G3/iAaCR9O 5XUAiaGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8Ryt6cD/ 9T4mqw3kZqs2TP+tHKQuzgqUq8fvSH8yl7t36bQGme497vOAOYDUHMeNE4NNyj7xPtcdwP+75JT6jQ wb7DTCpWkXZuXyGsVYOqkRCTrIpl86UhY4KJl7kpw0Fu/l4dVh2eGgHzXhYTOAo3UfEW3sblt0q+J+ HtBSelQEJJ2OiEIXdozIXjKe1DouxA6jfr9ixQF5KRP3O4K0H2jeTFQw6ruH2RXH0V+42ZwyL/q4Sb j0fby9n8kP9ZDxMNUNWPVhnFWuMwb3b9rwsQZME31GLbIEi/IEz54iXXAjlwVeS8CtY3ZuiDqfn4LC 8vkJ/5biap6lg9ReRc9H8WCmVZuC1O18jeYfRVCk9BRgjFDmynobDC5PWqUEaxs/4weWUTipElhtHA 7rTqLuMsOilmIdBqGSY9cpHXiMj/9tfMraqZKztusV0dkACiFyAXpSgpqErFXM4z8J7b6tx0wu+gza wtTNXet8pwRgRUHVnbyGX81YgMCL5AQXwiOot10sXBM00IBvoKOrdjO2422YWhUOHy6sF6R5id4V9D gM4HflpQhz5PAIg3z56BGX8bWfOlfbWkA6/HEA5p+yyTD21iW/w4wyZuPUEWaHIhxlVrp81m7Ajv+d 8KZJHw5wvIc5BbaAV1NGrm4shlGp+OwQG7OPQO17A7Vq6h8oz8Q8NJUJBu4w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Changes to rqspinlock in subsequent commits will be algorithmic modifications, which won't remain in agreement with the implementations of paravirt spin lock and virt_spin_lock support. These future changes include measures for terminating waiting loops in slow path after a certain point. While using a fair lock like qspinlock directly inside virtual machines leads to suboptimal performance under certain conditions, we cannot use the existing virtualization support before we make it resilient as well. Therefore, drop it for now. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 89 ------------------------------------- 1 file changed, 89 deletions(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 18eb9ef3e908..52db60cd9691 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -11,8 +11,6 @@ * Peter Zijlstra */ -#ifndef _GEN_PV_LOCK_SLOWPATH - #include #include #include @@ -75,38 +73,9 @@ * contexts: task, softirq, hardirq, nmi. * * Exactly fits one 64-byte cacheline on a 64-bit architecture. - * - * PV doubles the storage and uses the second cacheline for PV state. */ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); -/* - * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs - * for all the PV callbacks. - */ - -static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_wait_node(struct mcs_spinlock *node, - struct mcs_spinlock *prev) { } -static __always_inline void __pv_kick_node(struct qspinlock *lock, - struct mcs_spinlock *node) { } -static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, - struct mcs_spinlock *node) - { return 0; } - -#define pv_enabled() false - -#define pv_init_node __pv_init_node -#define pv_wait_node __pv_wait_node -#define pv_kick_node __pv_kick_node -#define pv_wait_head_or_lock __pv_wait_head_or_lock - -#ifdef CONFIG_PARAVIRT_SPINLOCKS -#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath -#endif - -#endif /* _GEN_PV_LOCK_SLOWPATH */ - /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (pv_enabled()) - goto pv_queue; - - if (virt_spin_lock(lock)) - return; - /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. @@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); -pv_queue: node = this_cpu_ptr(&qnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) node->locked = 0; node->next = NULL; - pv_init_node(node); /* * We touched a (possibly) cold cacheline in the per-cpu queue node; @@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); /* @@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. - * - * The PV pv_wait_head_or_lock function, if active, will acquire - * the lock and return a non-zero value. So we have to skip the - * atomic_cond_read_acquire() call. As the next PV queue head hasn't - * been designated yet, there is no way for the locked value to become - * _Q_SLOW_VAL. So both the set_locked() and the - * atomic_cmpxchg_relaxed() calls will be safe. - * - * If PV isn't active, 0 will be returned instead. - * */ - if ((val = pv_wait_head_or_lock(lock, node))) - goto locked; - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); -locked: /* * claim the lock: * @@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ /* - * In the PV case we might already have _Q_LOCKED_VAL set, because - * of lock stealing; therefore we must also allow: - * - * n,0,1 -> 0,0,1 - * * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the * above wait condition, therefore any concurrent setting of * PENDING will make the uncontended transition fail. @@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) next = smp_cond_load_relaxed(&node->next, (VAL)); arch_mcs_spin_unlock_contended(&next->locked); - pv_kick_node(lock, next); release: trace_contention_end(lock, 0); @@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) __this_cpu_dec(qnodes[0].mcs.count); } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); - -/* - * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). - */ -#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) -#define _GEN_PV_LOCK_SLOWPATH - -#undef pv_enabled -#define pv_enabled() true - -#undef pv_init_node -#undef pv_wait_node -#undef pv_kick_node -#undef pv_wait_head_or_lock - -#undef resilient_queued_spin_lock_slowpath -#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath - -#include "qspinlock_paravirt.h" -#include "rqspinlock.c" - -bool nopvspin; -static __init int parse_nopvspin(char *arg) -{ - nopvspin = true; - return 0; -} -early_param("nopvspin", parse_nopvspin); -#endif From patchwork Thu Feb 6 10:54:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962813 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7426322D4C9; Thu, 6 Feb 2025 10:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839290; cv=none; b=Q7OgrT988aWQIDg4nGl2NVRsv+2XuaWo7ui1G4UApV1SBtVcSNrmpsKK63klScy7jKzHabiXeMeHPA5tFpG/VgzWCPcVgxn+ZpkOqIBzlHyncoES8SHPcLaAYqrkZf7TD00n0vKGpyViNIZXXs+hMiLMNj4ilCDuS8qXKpMJq70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839290; c=relaxed/simple; bh=+CI7RDk98Onq/s+jrg9zfcbapKjfoHyX6Gf/cOAFD9E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aRLZkVqPoRWZIB2DoS3K8nlpGHdxxwwxph3beZyIVOwQBknOU89+LcbGWBzY2QNn8VGjq6JEo/fyf64iwV/37bjR6ATeKAJ3iOSiFTZBjwDL+WyFXFazXEugIX7DHbTqPehym25wgN3g6beGhZofpmTiSJdjhAjHgCZK/cUfx+I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UJpMS/J7; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UJpMS/J7" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-43621d27adeso4637535e9.2; Thu, 06 Feb 2025 02:54:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839286; x=1739444086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vg5Aj/VI1QlQmysdMT47W3wgHFqoKVdLw8Vt35/615c=; b=UJpMS/J7uhBwYNK5Fk9Mm+HrFEyvW6k3QUe3YMoyJ4f2WgQi8Kdik6AIssuhT5St1l 5nm7SYdpfAuYSFJtGkAd/k3TTzborYKdMOpohIVb8wVLbPhlKUHSh9fUtA4s8Q5aflEz 9P/odUuETv942Z0KrAVlVHzZWIPJQsWCxMbfxaczmQf4LajcAhjPuLtkV7X0m7FbFL5+ xfOUJtbao4UL7ULo11TO0/9Zm84vmdki54HR8Nio13lJyzlJr78zswbm88Os3Huacp5D jR9Uja8BLq/ZPN4CUW0MRjEHsSfwUXoBnJXY+BXD3F0dINGn0XRtc8EniiN1S5qg1Uol NeLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839286; x=1739444086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vg5Aj/VI1QlQmysdMT47W3wgHFqoKVdLw8Vt35/615c=; b=QdzAWHH1l2qFgAKuRLE7LzeBSw07xLz6yN+bIV7QCAetzRre3sqhuXFeEf8vQmlbaz zLF7cgOkYO1FWfJ1/AEaSs2XvFxci1lXMC7epu8f2Vey2GI31/xx6qbsDpS82r00rdL5 FW367L3PY+JI0UUuEdqrql2N63YFCdKe35UukQQ+5AyY+fn4xi5A2s4l2EIlIaGV0CnK xC+sDIEVsiqgA1s8fNs7hgzU+jz8Y8SPG9En59hogJgZNtV2+MVkgEg7OREQXACtGv+1 M+COWdk/012QmntK34SN+0b2Vu9WrZ78oSs2M3d0+WayYdMDrkGmP1szCNJuvnRGSTe0 ODfg== X-Forwarded-Encrypted: i=1; AJvYcCWyaKviVuEBos/PyWdtxPWR06l04+qVBPz5VgLnKzvmph8YsLZD1nBQR9tW63epcBevMvagOpQewewQtZE=@vger.kernel.org X-Gm-Message-State: AOJu0YzQGQxxobir3JGmUvFfSz6rir8v8phLYoyY+RKZuyL/yxItZ3qw k0o8pmwx8sToQwC5j9qFtDeAkuTRlUjkxaFvdGp4YEjj/dWFRziDnwQSpsyCVTw= X-Gm-Gg: ASbGnctI7qfmx+2kutYrZ17biecb/3AiBFpppK1E2KOMW33xAWk9xc0lxvvdxrqT+5k H8kP8fEnQhKnUiCNxk9i+MYer/ZB0FVu3lqwejYWcHGpYO5FFFzNDHOxNgNFDs+tNvR3IDsNX1z mKrkq7d908ycNgyuDSVTSYEDS6WlM8MLsCgsozh/ioYnPKDjPJVC6n1h2MVN9JBYddDmtmXAAnN grnhYmfgdVpLf8qQeO98Wq+QZlPde7sowAlgoBdHLxzIlf5zWwuXk2eUVTy1FOD2V5ULQqvzKLd Y4L2 X-Google-Smtp-Source: AGHT+IHYw/0AC8066ElVuaB+bONKwChfk8mv0EyRa3QvFZYateFuU3S3W5RTQoK6xFXLzRpsK0lELg== X-Received: by 2002:a5d:64ec:0:b0:38c:1281:260d with SMTP id ffacd0b85a97d-38db48d5ef1mr4800897f8f.31.1738839286065; Thu, 06 Feb 2025 02:54:46 -0800 (PST) Received: from localhost ([2a03:2880:31ff:f::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde1ddb9sm1383042f8f.84.2025.02.06.02.54.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:45 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 07/26] rqspinlock: Add support for timeouts Date: Thu, 6 Feb 2025 02:54:15 -0800 Message-ID: <20250206105435.2159977-8-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5055; h=from:subject; bh=+CI7RDk98Onq/s+jrg9zfcbapKjfoHyX6Gf/cOAFD9E=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRko56eerusglxFABuTrjI0BeTNdWet/EFfKlxx n6KicXiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZAAKCRBM4MiGSL8RypHfD/ 9HHaEjSUQc5CtopQiXE6c0Vwac7KpTMW4wwJycT3bCu1KquGtKAhLC2KFqVX2wHK4bl/OLK9vxVSmT Y9HJRDo6MOTwwlMcflPNZ3sTdgVaOaqei2zDvi0fcuAdVHriJxcP1KZX+BOtNaX4oIBqMixlOnWT8n zk9/HR/PwfXvbdewufYQssneftjxoWOvuQQeFEhZlkhohK0wkeQGgYZ6x6dpnN/VU3PpQQVTbRQFDI q2nwG3DsniPReJ/kfk+KKkvpMIXwuCUQCswIjrMd6X+f8Gbx4HkDaoCK0jNJmLXz7x/qnXM7efk5aJ eNUrFK4ZNtavQLPKEs7/u1ksDubcrsL+PcEjSLPmM7S83+t6sEMMKte4jA9A3EhlIe3pOMdshKIwMZ ox0u8m96KQo3KwGtgfkm1xYVf6D/WMtXXhPzrlPQ9no39v7Bzx0BPhb1CwvoxQHGMvECHciFWXf1E1 lwy9QUGhK6aq/O7Wbf9dLtTyqUL22WhmoHXboRhy8E1Q5+h+jywgW08oMQ80fekxqIfrp7xeQuJxbs 6tUwh0BODBW6hRLhfkjt3Qu89gY3T/J/7YxiR5wbSV5A47TQLxdDH3gvALoDmdy9OiVYWlvfv6LgUD KhAG4jecaMw2SpR61+5VnBT3O5WvSYhyuAOcyVQLr68OWpdNT5LRRVSCKKDg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect when the timeout has expired for the slow path to return an error. It depends on being passed two variables initialized to 0: ts, ret. The 'ts' parameter is of type rqspinlock_timeout. This macro resolves to the (ret) expression so that it can be used in statements like smp_cond_load_acquire to break the waiting loop condition. The 'spin' member is used to amortize the cost of checking time by dispatching to the implementation every 64k iterations. The 'timeout_end' member is used to keep track of the timestamp that denotes the end of the waiting period. The 'ret' parameter denotes the status of the timeout, and can be checked in the slow path to detect timeouts after waiting loops. The 'duration' member is used to store the timeout duration for each waiting loop, that is passed down from the caller of the slow path function. Use the RES_INIT_TIMEOUT macro to initialize it. The default timeout value defined in the header (RES_DEF_TIMEOUT) is 0.5 seconds. This macro will be used as a condition for waiting loops in the slow path. Since each waiting loop applies a fresh timeout using the same rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the values can be easily reinitialized to the default state. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 8 +++++- kernel/locking/rqspinlock.c | 46 +++++++++++++++++++++++++++++++- 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 54860b519571..c89733cbe643 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -10,10 +10,16 @@ #define __ASM_GENERIC_RQSPINLOCK_H #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; -extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +/* + * Default timeout for waiting loops is 0.5 seconds + */ +#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2) + +extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 52db60cd9691..200454e9c636 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -6,9 +6,11 @@ * (C) Copyright 2013-2014,2018 Red Hat, Inc. * (C) Copyright 2015 Intel Corp. * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. * * Authors: Waiman Long * Peter Zijlstra + * Kumar Kartikeya Dwivedi */ #include @@ -22,6 +24,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -68,6 +71,44 @@ #include "mcs_spinlock.h" +struct rqspinlock_timeout { + u64 timeout_end; + u64 duration; + u16 spin; +}; + +static noinline int check_timeout(struct rqspinlock_timeout *ts) +{ + u64 time = ktime_get_mono_fast_ns(); + + if (!ts->timeout_end) { + ts->timeout_end = time + ts->duration; + return 0; + } + + if (time > ts->timeout_end) + return -ETIMEDOUT; + + return 0; +} + +#define RES_CHECK_TIMEOUT(ts, ret) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout(&(ts)); \ + (ret); \ + }) + +/* + * Initialize the 'duration' member with the chosen timeout. + */ +#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; }) + +/* + * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. + */ +#define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; }) + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -97,14 +138,17 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) +void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout) { struct mcs_spinlock *prev, *next, *node; + struct rqspinlock_timeout ts; u32 old, tail; int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + RES_INIT_TIMEOUT(ts, timeout); + /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. From patchwork Thu Feb 6 10:54:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962814 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2F1522DF83; Thu, 6 Feb 2025 10:54:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839291; cv=none; b=fAWodnHGEGJMs/7gxa8Tt2T4lVj/GZCgviaW93TfxwruD6t8jpmrbLYOG8qpIC9d41o3KU3doJp5IsV3qFpv1Diuw0egGrXU7n7RfFC2RI34LdkCiPOgORbaYrRC2HriF7R7CN/qqdqLsLhZTijZO4y1BVZL42lmOn1yPFOsUYY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839291; c=relaxed/simple; bh=PZAstj2WMwdJ5zpDKugp3FElz+LV+4cvINkXq1LfzVc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kIQ+Xjg6DZsYEKmFYRTMmkjgrqzJwFaAS8vNGC5ALvqiQb6I3ZTw1HKLKf/eckcMgcAxvWOusoAmdA0tzM2mZTqLIELbZNoQQSfaDNR9YRCMHW/CoKM3eVQVzjcSnOUpUAD6PG+IcWoxLlJc0RsIUyn2v79D2xr5iD6r4trVgoc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DsIqNL3N; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DsIqNL3N" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-436202dd730so4838265e9.2; Thu, 06 Feb 2025 02:54:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839288; x=1739444088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u6HfEz6FnLcnMW+Wa6mBmkJ5vtGaoNq/ogsTG9QK4tE=; b=DsIqNL3NHzc+TwaRhNpYtb5rVfOVljK8851/P1LBlARdh/p4Gwb5gIVQLlqK5nmvSg Rz/31tijGNQYnEz/jRT3bDsqCWJRlUk2y6Mh/xHhSe6AfajwVxD6UD+Cm0ntxrhynVgl 11NFY9S0/aZ3Eo7yGz97L1yicF6HuF1XZsf+lQnoRzoQNvrKGMftP5jNn9+bON/9IkVq u1iK4QtxuRTG/RbCeW2KaLH1eRxearIAVUgQFYt/nzrVUGUxvon+AQWcScGgAKKiZ4sj YpGnS7DHJKhM44CAHtL8/sKRS3RrPHfMkFqGKo9wyvzqvTaheJ4KdyDOEShoOy3oIOR3 HI2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839288; x=1739444088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u6HfEz6FnLcnMW+Wa6mBmkJ5vtGaoNq/ogsTG9QK4tE=; b=G4FFpPtB0dISMnDfqc1EQRvP4twCL8klNOWXMaJwbWh6ba0TQCoL5XYxTYimDvaGDf yo3sdgcBnr3Pqlm2zq1suNt3gyKhHTgAvPP/b0i/6NKmRMGkYZoWQmk0kCsE20CS+AUl 1hxD3GTXzyJWd0LENQxFtcRxj13EET70yG2ncKuSAzRybal3Zg6u2Idb2VuoJslcpN85 wueJjUs36zsSyGTsGZ3tGPMyshzqHTOqpxK5m8oVEqvq/GfIM4qTFNTHMQ9mECiS0h0K yPu1UlqMAv3fd2g47v/IBcWbMa2jkrOJL120SNZltk4AOIu8SPHusuZHeeRV9gjLUP/Z mHSg== X-Forwarded-Encrypted: i=1; AJvYcCW90CIZJMnY01ruJOaTdlrLjdkUcLN3Ji9RSrpibV3JRZY9E+4PefY6fMYM3O7l5JRRnvLTVD82bDs6ou8=@vger.kernel.org X-Gm-Message-State: AOJu0YxNinOHzSHfPgXsotrQMebc6Zulb8a/F7zMoAFbvrtnXiWdWWj8 n2Apaa3/4UlQq7AXMNYGFfoxfs0PDYn1KU3LPxHu85R/X5b5ruGDP0hLzED5f1o= X-Gm-Gg: ASbGnctErxvaH2ujtc7zt3e/NcDN+nzWy9T7OcluVijBCNpw2r75UvQ8sWQm7RGTMOk VHAHemYF8/bAG+S4gdMjmpBsVTul4ziKVXPptbGrGufILQISLboQJAbPNodTMbe2oWkROkBqlmu UfJaEm3xH9ItjJRTQSWMHDfzhpoQcoa7cczbr//8bRC8VFoGDNxQ9WA0+JXAWbIpQWGv1TNfIGE XWlOZ3cS68379ZApYc+6+qAWrxoSr7EYiZLupR03Tual/G8pNeBEr7NDYP/aqo6bsZ1aqrnjO/4 hS0K9g== X-Google-Smtp-Source: AGHT+IEFTmIDnkApoCxo9XEdAou5ZGEMDlqk297HAU+HLuXK1+LhNaJhCM/rdUs7kriHnIXhGFI/oA== X-Received: by 2002:a5d:5988:0:b0:38c:1362:41b5 with SMTP id ffacd0b85a97d-38db486108amr4754830f8f.6.1738839287460; Thu, 06 Feb 2025 02:54:47 -0800 (PST) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde0fd3csm1414640f8f.62.2025.02.06.02.54.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:46 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 08/26] rqspinlock: Protect pending bit owners from stalls Date: Thu, 6 Feb 2025 02:54:16 -0800 Message-ID: <20250206105435.2159977-9-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4602; h=from:subject; bh=PZAstj2WMwdJ5zpDKugp3FElz+LV+4cvINkXq1LfzVc=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlQkqPybkJ7GSLq/QqBmQtgES+UrYcxxOebbfU QQitnoeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyrK9D/ 9TAnlSHxA1Q3uFCYcCdMIawjW1IJgTMIKG08AH5HflcEBfh12ex5nIURCNNtVmT8KNdznwDtmSyf9F Dz9E3Hb1omPeX7fZXLE7IFq92YLTS8ZhF954tYzs/ccHTnFSiRQUEYaPNN5BjAreKu7VsuNJTWWwh2 Zk18PTGBgcCoITYY2U9kPpt8H0ss5SjjE/v472ug3Sxr+ikSfRFCNfqHQlM07gdJeffwM2YhPLmyrA pS/ZOZBqjyv6FJgGZvfHKwzMoadC1/BvbPU7eQpAJasZGBSDhGpgWqGWPnrkd1Xx+Qpo5SaJTQv2Bf FVE3fsLPC8zh8SYzTkjlRiUuMQDkkUeC71FmooULzLbfpkqhhr5IlQIpcVaKafW3bGY7gkNkxgbUcR ZjGTHp06O6I30+WbvFpEoQXIBKla/RBxSi2qHXXdZ5xlHp/f0BpJfujOwZr5R+QHWNX4h8hvTigyRu JMiZjgK3a/DfYdZdkC7+3+Y4IWzCH5XFsthPvV6ZgI+HUdIaJu6gLuw1LHLahy0sUtSdl3j3nbkdnN nk5039QdF/msE7FiR8sEyIhyL9JjjIcbswOwvAx+cPa7i/XoU9RVLyZ+toJUpPQsXNfR7MRq1YEGFC OwvNNjfgANI5dQmuZkKEQfmCLhErAm7WdD82gJqews3JIUZe0bapo/LeUq6A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net The pending bit is used to avoid queueing in case the lock is uncontended, and has demonstrated benefits for the 2 contender scenario, esp. on x86. In case the pending bit is acquired and we wait for the locked bit to disappear, we may get stuck due to the lock owner not making progress. Hence, this waiting loop must be protected with a timeout check. To perform a graceful recovery once we decide to abort our lock acquisition attempt in this case, we must unset the pending bit since we own it. All waiters undoing their changes and exiting gracefully allows the lock word to be restored to the unlocked state once all participants (owner, waiters) have been recovered, and the lock remains usable. Hence, set the pending bit back to zero before returning to the caller. Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout event statistics. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 2 +- kernel/locking/lock_events_list.h | 5 +++++ kernel/locking/rqspinlock.c | 28 +++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index c89733cbe643..0981162c8ac7 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -20,6 +20,6 @@ typedef struct qspinlock rqspinlock_t; */ #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2) -extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); +extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..c5286249994d 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ #endif /* CONFIG_QUEUED_SPINLOCKS */ +/* + * Locking events for Resilient Queued Spin Lock + */ +LOCK_EVENT(rqspinlock_lock_timeout) /* # of locking ops that timeout */ + /* * Locking events for rwsem */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 200454e9c636..8e512feb37ce 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -138,12 +138,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout) +int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout) { struct mcs_spinlock *prev, *next, *node; struct rqspinlock_timeout ts; + int idx, ret = 0; u32 old, tail; - int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); @@ -201,8 +201,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * clear_pending_set_locked() implementations imply full * barriers. */ - if (val & _Q_LOCKED_MASK) - smp_cond_load_acquire(&lock->locked, !VAL); + if (val & _Q_LOCKED_MASK) { + RES_RESET_TIMEOUT(ts); + smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + } + + if (ret) { + /* + * We waited for the locked bit to go back to 0, as the pending + * waiter, but timed out. We need to clear the pending bit since + * we own it. Once a stuck owner has been recovered, the lock + * must be restored to a valid state, hence removing the pending + * bit is necessary. + * + * *,1,* -> *,0,* + */ + clear_pending(lock); + lockevent_inc(rqspinlock_lock_timeout); + return ret; + } /* * take ownership and clear the pending bit. @@ -211,7 +228,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ clear_pending_set_locked(lock); lockevent_inc(lock_pending); - return; + return 0; /* * End of pending bit optimistic spinning and beginning of MCS @@ -362,5 +379,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * release the node */ __this_cpu_dec(qnodes[0].mcs.count); + return 0; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Thu Feb 6 10:54:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962815 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF92C22E400; Thu, 6 Feb 2025 10:54:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839294; cv=none; b=AfXl1s64yyGaxbotGuZ88wfDSIZUpGFbFL3BtkYA1NQwXyZH4NRiGwKAZSOP5Lph/akYlcBrRxkuqr0AUu1wJLNZgGuUjrvLwmX9NU9Hw6oUSNAwU4ZCL8qzOwvhhUkrbkRWJ6tbSOvmM1LqPbyswrxKn2QUcQFRUi5GzM0vcbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839294; c=relaxed/simple; bh=Xa/jCeO/bNAS2DABSLS2XeyE6NH1yv2QeubXVXtSfFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eDov2RN8BNGXChjiKVo4lYi+cM8DJThOb/xG/8fpazh61LbdPFaihPH9Fzrc0mSeuCe6aiCAkjwI0GbWVgK8Zhj1Ca1a9bqXHF6nfCkyWcF6wSUzCszdMDmm85wJVupP+aT1d1V3ufksYpJI4bUgZFpFsIoLTjPD2eih1y9Kbvo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MqRXKBxh; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MqRXKBxh" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-43635796b48so4157305e9.0; Thu, 06 Feb 2025 02:54:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839289; x=1739444089; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iYidbmwxCEV+eSEjHIGui462v/Vxn8uyk3zJ0J48/Wg=; b=MqRXKBxhCkdnpp9DLKsTQVYPTCaGmSe6LwvI7z32hKZFbB5mY50tbWz2LzBGDEpjXh nDO0Iva3925X/DQrE0jWgO8j6obQWAYGBTvfuZQ07v3fDRCcaANULlot9/WTh0XdTtib sOTggTy0BGiUquHPnk5ceOyZaDLpvwBO4SWhtgb9R8FgJLiia3x8TaFJZ44v0u4QMOAY 34q5HYkwIlB/4okZ0ElyKalO67GGT19/oe56EIER/KakL673CAC4x2jcmrMH1Zauync2 vjWNpWbDro9R8fNe5dLGeiVWU3e8fPbQtojobnWbhhz01NBsxs2n5ikAOd+lHNmH+fUs BeiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839289; x=1739444089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iYidbmwxCEV+eSEjHIGui462v/Vxn8uyk3zJ0J48/Wg=; b=A8c7oW68M9YB/1vuwwQCmVAVAreKDjK83WAtdL1WdKXT7TUEY0QB4s0j2CbGiYsV1O fvAGnlyT0QYDhheKF9T85+iQOQH5c+atOTyh72UHz8QdeBuMre29KivYHTE5+GXJ7Anv NxaHpmer3tDhhfzUbxj43ON457OLZQET6qHaeUoDG5EZmZkAX2srrAdHnRSi4xef9vxl 28Lmhri2zLd9ddAxmfzd4x+LL3ZBYdIY/fpAWKBr6Gt13fO0T07YU+XRAlNQNudNnVcP jHDCJoKd4NpsYI3e6RkfkBuhMhcWfYUaTiCvpzc6c07Rnll5mS1ZsahLc2iuYx2793Fp pgXw== X-Forwarded-Encrypted: i=1; AJvYcCUj4mtgvVCbBD5B9Rb+ywxlbPxGFap+UC8E0RdMJBp7qvnXWEIBOJ1wTCHdTn9UXcdLA5Guc0k1uVWhnUw=@vger.kernel.org X-Gm-Message-State: AOJu0YwfxFxZqMpfk4urWmoJ7xznaYq6tbgwYJ7vVQFRz3R8v0ieZ+W9 x01PLovTi26ce6JxNm5DUeOL1ntK6tUX8m4PrPu5Ox8GfAIgz80ULQAwEfLxCJc= X-Gm-Gg: ASbGncu5rsYMVdhrr3mdxv5QYoR5C6TkBqOQf6xIUwv5sCVdL19v4ac9NQ8GqYQtaOa TZh4MbqCp7cGEtVD0dHULhq/bPnlSOBxG7TQqpwvqFNti63MAJ1d8fjJ7JUPhmUqaOh6/ZzC0gN ASLOG0d+cnqbu2gIFm9uGbjKMF5b7o/tBj5jxDoT8Z6qivZcYt4CWlxb1C25x5d3fAtdrTHLi7q dxOoW1I6Pn7HvTWeczz3/Av0+y8n+QxUFoqpm2GPkywL7CBAaV51BXcQ4IKxDtNSIDGh09n4Zn2 W5Zz0w== X-Google-Smtp-Source: AGHT+IFphip47QJW2PrpFsbKi02KtCDzlFv4DF2sCZj2+AejG4aG7KuSt5Jt87tvagiOv6cvGdW9aA== X-Received: by 2002:a05:600c:a47:b0:436:e3ea:64dd with SMTP id 5b1f17b1804b1-43912d3ef4bmr20857325e9.11.1738839289015; Thu, 06 Feb 2025 02:54:49 -0800 (PST) Received: from localhost ([2a03:2880:31ff:70::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd4df6bsm1429709f8f.39.2025.02.06.02.54.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:48 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue from stalls Date: Thu, 6 Feb 2025 02:54:17 -0800 Message-ID: <20250206105435.2159977-10-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7029; h=from:subject; bh=Xa/jCeO/bNAS2DABSLS2XeyE6NH1yv2QeubXVXtSfFg=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRl7An7/KiEUm98Ur9vnCib0zAC2FnG3ihYCJKd QcDong6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyrulD/ 0QF/IO1vH6JRLlfF5fdCoL2J3bq6TUizWVQKnJM58Ak3TLWy8SAYpkkznDpIqhFohSNctWXqvIZ+9p 16sVoQkzVrUVZ7oFbcrGdJRcSkqMQc1LJCTJG+1+S0O458BwdjYHBxyoigb2bJxJ1AqoTakaX6rOFp q41csSMJragiD4b837bqZkq6Kcnt5NFp7XO/6Ca4/ZgmqpXuFA/o2zmajpZUM8v+bjVz/X16xlcuV6 SBxSHt0aPLBsnLU1BoCQrQWdpFpl1mJe+oslsG53qb/RfuoKcTBhetjeceQB8Hit/6sBcswtT44WAR Cx69D/8D2KZO/Q4Lrg08OxeuZadzrtnc6jOHnXI5DGab+HO17FzujG+flkF5KdBYru82bFLsWr3W9W P6WH1ACcX5G6Ivg/ah/wpe8/6GW6+02j3XKd5r0tgiGp4JDq0CnMOs8Sfe+LmDPiEuo+1zg/17KnwP G2OnXHvHPsyjcj5T2kVQOok60uFZk2WMFKXgMR23OcgQ/3g4sugfLA3JBDlw+xL1XfUSYgGPaGyi7o YNfJ07vRVHBXylG06/Y/GytWzKZoeQxLMeb8CkChWXxgnS28C3OZY3G1thv2fdFQ1sG/nUDTHZxqbj n/6RKyGflImPIdWq7jQWqYNDTxpBkJyANG+96Y+NtplOkmtZPIZtX2ePJ0yQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Implement the wait queue cleanup algorithm for rqspinlock. There are three forms of waiters in the original queued spin lock algorithm. The first is the waiter which acquires the pending bit and spins on the lock word without forming a wait queue. The second is the head waiter that is the first waiter heading the wait queue. The third form is of all the non-head waiters queued behind the head, waiting to be signalled through their MCS node to overtake the responsibility of the head. In this commit, we are concerned with the second and third kind. First, we augment the waiting loop of the head of the wait queue with a timeout. When this timeout happens, all waiters part of the wait queue will abort their lock acquisition attempts. This happens in three steps. First, the head breaks out of its loop waiting for pending and locked bits to turn to 0, and non-head waiters break out of their MCS node spin (more on that later). Next, every waiter (head or non-head) attempts to check whether they are also the tail waiter, in such a case they attempt to zero out the tail word and allow a new queue to be built up for this lock. If they succeed, they have no one to signal next in the queue to stop spinning. Otherwise, they signal the MCS node of the next waiter to break out of its spin and try resetting the tail word back to 0. This goes on until the tail waiter is found. In case of races, the new tail will be responsible for performing the same task, as the old tail will then fail to reset the tail word and wait for its next pointer to be updated before it signals the new tail to do the same. Lastly, all of these waiters release the rqnode and return to the caller. This patch underscores the point that rqspinlock's timeout does not apply to each waiter individually, and cannot be relied upon as an upper bound. It is possible for the rqspinlock waiters to return early from a failed lock acquisition attempt as soon as stalls are detected. The head waiter cannot directly WRITE_ONCE the tail to zero, as it may race with a concurrent xchg and a non-head waiter linking its MCS node to the head's MCS node through 'prev->next' assignment. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++--- kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 3 deletions(-) create mode 100644 kernel/locking/rqspinlock.h diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 8e512feb37ce..fdc20157d0c9 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -77,6 +77,8 @@ struct rqspinlock_timeout { u16 spin; }; +#define RES_TIMEOUT_VAL 2 + static noinline int check_timeout(struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); @@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { + int val; + prev = decode_tail(old, qnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - arch_mcs_spin_lock_contended(&node->locked); + val = arch_mcs_spin_lock_contended(&node->locked); + if (val == RES_TIMEOUT_VAL) { + ret = -EDEADLK; + goto waitq_timeout; + } /* * While waiting for the MCS lock, the next pointer may have @@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * sequentiality; this is because the set_locked() function below * does not imply a full barrier. */ - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + RES_RESET_TIMEOUT(ts); + val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || + RES_CHECK_TIMEOUT(ts, ret)); + +waitq_timeout: + if (ret) { + /* + * If the tail is still pointing to us, then we are the final waiter, + * and are responsible for resetting the tail back to 0. Otherwise, if + * the cmpxchg operation fails, we signal the next waiter to take exit + * and try the same. For a waiter with tail node 'n': + * + * n,*,* -> 0,*,* + * + * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is + * possible locked/pending bits keep changing and we see failures even + * when we remain the head of wait queue. However, eventually, + * pending bit owner will unset the pending bit, and new waiters + * will queue behind us. This will leave the lock owner in + * charge, and it will eventually either set locked bit to 0, or + * leave it as 1, allowing us to make progress. + */ + if (!try_cmpxchg_tail(lock, tail, 0)) { + next = smp_cond_load_relaxed(&node->next, VAL); + WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); + } + lockevent_inc(rqspinlock_lock_timeout); + goto release; + } /* * claim the lock: @@ -379,6 +415,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * release the node */ __this_cpu_dec(qnodes[0].mcs.count); - return 0; + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h new file mode 100644 index 000000000000..3cec3a0f2d7e --- /dev/null +++ b/kernel/locking/rqspinlock.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock defines + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __LINUX_RQSPINLOCK_H +#define __LINUX_RQSPINLOCK_H + +#include "qspinlock.h" + +/* + * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value + * @lock: Pointer to queued spinlock structure + * @tail: The tail to compare against + * @new_tail: The new queue tail code word + * Return: Bool to indicate whether the cmpxchg operation succeeded + * + * This is used by the head of the wait queue to clean up the queue. + * Provides relaxed ordering, since observers only rely on initialized + * state of the node which was made visible through the xchg_tail operation, + * i.e. through the smp_wmb preceding xchg_tail. + * + * We avoid using 16-bit cmpxchg, which is not available on all architectures. + */ +static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + /* + * Is the tail part we compare to already stale? Fail. + */ + if ((old & _Q_TAIL_MASK) != tail) + return false; + /* + * Encode latest locked/pending state for new tail. + */ + new = (old & _Q_LOCKED_PENDING_MASK) | new_tail; + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return true; +} + +#endif /* __LINUX_RQSPINLOCK_H */ From patchwork Thu Feb 6 10:54:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962816 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE8EC22B5A1; Thu, 6 Feb 2025 10:54:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839295; cv=none; b=oSYyFV0ah6s2KkVlvrM1QKEGQp3+9pF+XluAz+N4OAk7gJcwmbZyyjwS+Xn9lWorEmmXIkI5fcRyR3QRnDOw2ls0vtjH+ig7NkKa3PoB6VHK4NtUe2asv1TjYZcVaU49Lk4bfu4Gsa4Yx4D4JC5NSakVLk/ZMg75Dl8v79k/oxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839295; c=relaxed/simple; bh=jiFbzaZcn62jcwbclsn6fFlBfAVbcfgq/5K/rfr88pM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u1CP+gyzYOzS/kUzvfWs7YDLPmQoowbwEqRF94YPnUof5dbKYi5N00KLD62VVz11dWCE2l7G3/uit4/ya8jXylQBPSAIBzpQyWe6rHScxZi+1W7QlSvbbu/azWAELTJC42dm0REgbOlED8EkycTPtS/mQkRBbQciUVMEJx5q8nc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UZVCKj3P; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UZVCKj3P" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-4361e89b6daso4507865e9.3; Thu, 06 Feb 2025 02:54:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839291; x=1739444091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vau+zNCewZiHW38t3Nl8HxSJyuVHZ3A5oWGeCWcGqfM=; b=UZVCKj3PsaZCa/LJ2SZJkiydtrRq8UnHR1yV/fcAJpgbsvLfHs5o6zMARakQI15vlh yvoN3y3Oh6YTYtLEIO7x+esHTsY8FZOxr95hNVmqTPC21zKYHFvfBdHtxUnvuiRpY2Z1 Br4v9G8Dywl3oXba8P6bVLiwz4SuF2Cx7b9pCTXyzMzPDYVLGTfH6YOUrR8yQd6q1P9I lgAkdz6Q2wQRFgFwZR0aym6OlqDiYv4zmc2IdGL45CcobsGTtKtylaj0isMwYzpVRHQ5 9xPdyc7euyRWYwPl0Zc8OwOHk2YmI8Vh0vd7KwPaBrWI8w1rNXpw4c5rq0d/M/nOXDPf Nfpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839291; x=1739444091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vau+zNCewZiHW38t3Nl8HxSJyuVHZ3A5oWGeCWcGqfM=; b=EDKJh/PyrgC3ugsj1u1L34ETX6B+VN8Xcx6tt6R/U8wGTezIcHI6rBWzRlizb8rgkl SutEhHXUFZxjuReMCzeVzSj3fIzZvOwI/JHQiWw+0xJCXIU8jY5bgLZVffzfgq62KlQ1 ZC7/+to2I1w20G1Hc4aSaJt2UF2UE4SyHR0wpwV78eeDlPE2tXbeROa7hP+7XD7GBknS VwmgOZLzFTT54a+y9pNCU9HpnSKo/TB4QJhKrJX+vBRdE7f4/ISryAAtlFwffir988Z4 m49u1Y+n5kzw7VZiaXbfc/di2Gr9xKsyJc2hdj2u7MJ8dlbkyLpbHXIooWvCCqKv4RJX mj4A== X-Forwarded-Encrypted: i=1; AJvYcCXd4gmiICdJarj9mKh8lQsxyZXmIuyoPtkSPKquhza8vcfsEH0aj/0I+YyEANTPW1shT9o4t4ulogbW7LA=@vger.kernel.org X-Gm-Message-State: AOJu0YxPK9nPrHLuPDQIqUtnNBas5YxFmFVBrNf+JAk8a9Khu+wbu8/y nskCYD+1pzSDO0xGS1p2+Dd+M5SaB5UDW6OBFkywgSZnJ9i2yqJVwvVgaX6Bjgk= X-Gm-Gg: ASbGncsYXPvOo4+tzmcQrgsZJ2XC7HUIl9Tb+8PZ40MA662X1w5Ad/JPRcZKuR4dRFP ykPA+mB/dk5wjXJ6j0tzuzjPPml/C/MB3c5jXVqKIqD/He+4Lg1lZojulcUqliMi4LJTKloi4U/ E057CFIKckI34jE5rpeW1lpDlnl6rAmd6ME0Z6Bawi7OrOwvCQlWIajzyCioZm3YTCd3rl39l9c IRwvqQ00ggDl8+uD5dn6jRJvv0kmBPTX8mGV2tgTE1dYWzCThJzSPcTowXikck3i2Epq50GGETC otfP9Q== X-Google-Smtp-Source: AGHT+IH09CkYJ0UtkBe+a5qInFgvMXJIECzPffzqtSRnQ4/rOtucmd7AR3vCh/FlzDD0l/wp6T1GPQ== X-Received: by 2002:a05:600c:3b11:b0:434:a7b6:10e9 with SMTP id 5b1f17b1804b1-4390d43f76emr48688805e9.17.1738839290760; Thu, 06 Feb 2025 02:54:50 -0800 (PST) Received: from localhost ([2a03:2880:31ff:14::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4391da96640sm15993445e9.8.2025.02.06.02.54.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:49 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 10/26] rqspinlock: Protect waiters in trylock fallback from stalls Date: Thu, 6 Feb 2025 02:54:18 -0800 Message-ID: <20250206105435.2159977-11-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1512; h=from:subject; bh=jiFbzaZcn62jcwbclsn6fFlBfAVbcfgq/5K/rfr88pM=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlAFWO9LNwjO/6CG81gcHcw9dLmIxeb4fj+/Fr WCxwUJyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RylLCEA DDo4P5cRJ8lPeqvkLpxQQ1B7QXy+KIgoUK7g8esRrYEGzu3/eXsNpmYAoOvt3pEFrK41Z912PBSwiZ i494ekqsWp8O2NUu2LAySNMiQoLle49dVCJVCS6XExiZjShBBUkEtEuCd6nUmFQI75F1VAg1iHmFJB Cfib9a+3IucRelAmkgBnQ/O2d/fQo0v+SD0awcgBflP6r/Lt63AmQA4aZ4sFZ6racNxenALC+V6MfK eW1pM/lb2MDwv9GUrhhA/tvKIXCGXkR+t6Ujo9S300ORDUjNyeubMZtLOCzHkmeb6HyI/laIez2iWD Ek0KD+qMhLwDY6ZzHwu0bVuubuavObYEZQjiTGON1QtULb4JSDF56lM2C2rOn45iYri4ihN0gLv3Zs 2ViXlGn24tXGV6kxa8MyOwWHOShlj4BRY0MGSol1jU2UZbp2ztn5iujPHmtcZDFMbAAe8HF8eaAnJp EhjV0kpIyUMml54A8fDiGMgjt5jI0ciIUDnEYtzcDCC56XaCPRwRRYluCIYOk1O6sczyszhNNwEWsM +oYoNqL47WMzOJThezb9dy0n2ARWnKz5fdrzN1fZIqArDB6QziEdYH90bwROd3jhypRSPWfKyYUWg7 y0WljjCeSXbVoHZmX0zVKWnH/l+jCp7pdBirkN9WK6Cr5eBJYTtvKNDwUP/A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net When we run out of maximum rqnodes, the original queued spin lock slow path falls back to a try lock. In such a case, we are again susceptible to stalls in case the lock owner fails to make progress. We use the timeout as a fallback to break out of this loop and return to the caller. This is a fallback for an extreme edge case, when on the same CPU we run out of all 4 qnodes. When could this happen? We are in slow path in task context, we get interrupted by an IRQ, which while in the slow path gets interrupted by an NMI, whcih in the slow path gets another nested NMI, which enters the slow path. All of the interruptions happen after node->count++. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index fdc20157d0c9..df7adec59cec 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -255,8 +255,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); - while (!queued_spin_trylock(lock)) + RES_RESET_TIMEOUT(ts); + while (!queued_spin_trylock(lock)) { + if (RES_CHECK_TIMEOUT(ts, ret)) { + lockevent_inc(rqspinlock_lock_timeout); + break; + } cpu_relax(); + } goto release; } From patchwork Thu Feb 6 10:54:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962817 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C405D22F16A; Thu, 6 Feb 2025 10:54:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839297; cv=none; b=sqzngBAnJ71ob9KM0x7/G5G2lEiuTLliijdZ2Bm3FkK+j6fdiqrcDS5iDk5dJP8BS88HmhRkJ+zuJ2L8RfxPeCxSaTFcv9Gvkp2ln/Boe/HChkPYdJTuJP/jVM8NdgJIlI5ONDJMf+3bkxXo2VpD3QMESHc2YgOZLJcIDVPLG+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839297; c=relaxed/simple; bh=msNcOGVmrnAfRu3MV0az2zqDUq7IHfqHVuBu+GMrk4A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ka0TmfdLE18JzcSxTMHt+f8k770caXmt1uVUUznZyksbTTo03qm3I7k43WxcqpPcaL7XTdc/elWO9YrMXTpDQ2h7FSZjhWoqQQWyCBIMkek7iK3Sv2O39i69VqCpCk+C0J4EPOMaMBAyhexlMH13gknRvmhCCzfb4Nmp+OadGVU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ab9yPqLr; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ab9yPqLr" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-436345cc17bso5141215e9.0; Thu, 06 Feb 2025 02:54:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839292; x=1739444092; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8M3jGKVoMOaiDp/mgjTJ4k4swCXfL8TDoJbCV0k94ZM=; b=Ab9yPqLrvJWg8Joyz5ZMeQgk//ncJ7XjcY8OC7mq7lDmc8WX82pG5HWBxtkUyL+z8U Y2L+9kvjxR+7dlGAeMdY1VvmVpTmBc5j4QKzwIBNHnQUABGMS3szQMfw7XmBgOXCn1s+ h/kwRK+CgDs8g2zFOxWciauwdtKmM7ODonL4SiuvzifuYs/PAH8N7klkcCGZv8070peK n/fIKNNJsHLNKStVFi1C57aCbXdz25QMXhGCFIQieVK9KPO6Z4IupThYLHM9SBeACIKu UrdKPvUqCvMx/qv6L9ts9Q6UhoQ9+o7A8ovkldEKvhLnPfMJvcYj7Gd8pVDwCrvUnBcg GxDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839292; x=1739444092; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8M3jGKVoMOaiDp/mgjTJ4k4swCXfL8TDoJbCV0k94ZM=; b=r6kRqViWBgf5aGv4xXxjF0hEVNtWeCrxgKs1/87AA1Q7cS3gQkGNjJ/srhD6cmfI5E stXedqJj4B7uigNWJC9FUvlOQfhmKf6A+djESosWK78G9BlHsm/1qdq0UMrtE5nkYXkt bxcCGo6a8vJAeifsIcmT/iRB2ndqFev4GEdNOnFnznowD9o5eSpqInkPwOMwnLt99UgT kOwvQaxT5QdNaEoMnZ/YjpTK79MO3I+6AZFXM3d2t69p2TMrTVsPmn0BIvHZ6ExjXEn/ 3fBOyYBxNufr0CtQLQNoRi7nU//SZCzC5VNQ2SsRDJvQFJKl2uRE+MBRQrHvtj+Ci2/7 urwA== X-Forwarded-Encrypted: i=1; AJvYcCXSXYm1dj+WSZWemoNPsqloo8zV72gv7x7jEkIbF4klr/cNyZO+bwxV+n9O29FYkVbXS+Dgx4/MT3/UrMA=@vger.kernel.org X-Gm-Message-State: AOJu0YzWM2VL3L1I8YzykgV9qxKsZJG2DDon2Dw4ifKgaER6vIKcN8Ta gJSvuZr7SuV5qmH5Sna3Yb3HtzGuL5XSenkLzJselySfKGefrGjb75lV+4VUaAU= X-Gm-Gg: ASbGnctaEUdoH5CHC23NoPv9+1sRlxXugwRL77zerv/NP+GNzPdqpwpt9Gnt4e2Oaar V5SzPLFPytjXUBSjD/NPmZn42nFWyWllStNhDJrVPP7DLN50kNXLfU/33fyRp8zwP1hjQhRP1C+ QpphsYM3hZy2PmnVT3Ha1ouShiPCbD86/lHiU8wJAOiwYmqEugGM2PEzxqtuA5QjboALHe2a9Be o2CfA/alvMEih3/JTBDGJFcMw5wo4a4pAGH2ZcnRVbhrktSxqRNo6fb9rbJPLJlDN/ZFijVybwX 8xu83w== X-Google-Smtp-Source: AGHT+IE6Ytf9lAGZg/32w5jdCCc5bYoMNRLBU5aCkHim0FkVJBaC7h0Wj/nvgFrS1eA2CBln15W8Xg== X-Received: by 2002:a7b:cbce:0:b0:434:9499:9e87 with SMTP id 5b1f17b1804b1-43912d37614mr17657335e9.25.1738839292380; Thu, 06 Feb 2025 02:54:52 -0800 (PST) Received: from localhost ([2a03:2880:31ff:73::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d964c7csm50548735e9.17.2025.02.06.02.54.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:51 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and recovery Date: Thu, 6 Feb 2025 02:54:19 -0800 Message-ID: <20250206105435.2159977-12-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15389; h=from:subject; bh=msNcOGVmrnAfRu3MV0az2zqDUq7IHfqHVuBu+GMrk4A=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlbGObmmTmVt1u04gEed3cMy7FmALO/uAVks+a TeeoBgaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyiK0EA CnsJbt64sQX0kL016RYhlMZNffg/GcHvfv3Z9oFIBHQcHLm5bLwHfs4H4NTMyGbVhYbcOgTPdLdE2D DthUBlKJzI3++9fzS21vMoTArLRc+cWJMJlPcldmQQN9GEWs+b/JWYeCkyYbCSaTMqfXWCi/2rknpd IiM1cB9cu/wfEfje5N5LFnbXnCHoMreNOieQzxB0QIh/i86hmD+ykd3UqiUv760Z68PHozD3TfRpuT a7PXTCg4c8cYvwNs7CUHE2MF8SKHu2yYK7VA64+d6gkAsPGSctiNYcK9x0UYRHp/PZlfIbUrKsJWYW tTrmTMXAB0cNc3BBtMolkvMRTrXuMvMxootdKJZ810qc6RYfElFL7pzLsXWo8oB4OGyn3Gau3po3VG AJYUxKpTKF4k2WW+F+TbGolt6W/7v1jo2OXUzoMs/LFeFEB5XB16UoTp4po/kxAuX3GqLvlMtWsZcQ SG7GVcmsGx9lvto9F6yVjMvevUytjuT4u6A0cwgTSC4pDTlJXyDoN7VwX0iuTqnKddyh943RII4MuK d+rkjWBQWNMkwlqTLUJ4UxOusNS50rJNbWBpHmlApS0Stlf0zCOKqKcIVM+XepT1nadbnyrUe/Bhy8 FhajBZBdwockQWCjCt2Z1Mi6wiwkcY6rTQvd6J62f7BWXOkVhH+60sQr9qyQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net While the timeout logic provides guarantees for the waiter's forward progress, the time until a stalling waiter unblocks can still be long. The default timeout of 1/2 sec can be excessively long for some use cases. Additionally, custom timeouts may exacerbate recovery time. Introduce logic to detect common cases of deadlocks and perform quicker recovery. This is done by dividing the time from entry into the locking slow path until the timeout into intervals of 1 ms. Then, after each interval elapses, deadlock detection is performed, while also polling the lock word to ensure we can quickly break out of the detection logic and proceed with lock acquisition. A 'held_locks' table is maintained per-CPU where the entry at the bottom denotes a lock being waited for or already taken. Entries coming before it denote locks that are already held. The current CPU's table can thus be looked at to detect AA deadlocks. The tables from other CPUs can be looked at to discover ABBA situations. Finally, when a matching entry for the lock being taken on the current CPU is found on some other CPU, a deadlock situation is detected. This function can take a long time, therefore the lock word is constantly polled in each loop iteration to ensure we can preempt detection and proceed with lock acquisition, using the is_lock_released check. We set 'spin' member of rqspinlock_timeout struct to 0 to trigger deadlock checks immediately to perform faster recovery. Note: Extending lock word size by 4 bytes to record owner CPU can allow faster detection for ABBA. It is typically the owner which participates in a ABBA situation. However, to keep compatibility with existing lock words in the kernel (struct qspinlock), and given deadlocks are a rare event triggered by bugs, we choose to favor compatibility over faster detection. The release_held_lock_entry function requires an smp_wmb, while the release store on unlock will provide the necessary ordering for us. Add comments to document the subtleties of why this is correct. It is possible for stores to be reordered still, but in the context of the deadlock detection algorithm, a release barrier is sufficient and needn't be stronger for unlock's case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 83 +++++++++++++- kernel/locking/rqspinlock.c | 183 ++++++++++++++++++++++++++++--- 2 files changed, 252 insertions(+), 14 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 0981162c8ac7..c1dbd25287a1 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -11,15 +11,96 @@ #include #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; +extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); + /* * Default timeout for waiting loops is 0.5 seconds */ #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2) -extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); +#define RES_NR_HELD 32 + +struct rqspinlock_held { + int cnt; + void *locks[RES_NR_HELD]; +}; + +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static __always_inline void grab_held_lock_entry(void *lock) +{ + int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt); + + if (unlikely(cnt > RES_NR_HELD)) { + /* Still keep the inc so we decrement later. */ + return; + } + + /* + * Implied compiler barrier in per-CPU operations; otherwise we can have + * the compiler reorder inc with write to table, allowing interrupts to + * overwrite and erase our write to the table (as on interrupt exit it + * will be reset to NULL). + */ + this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock); +} + +/* + * It is possible to run into misdetection scenarios of AA deadlocks on the same + * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct + * logic to preserve right entries in the table would be to walk the array of + * held locks and swap and clear out-of-order entries, but that's too + * complicated and we don't have a compelling use case for out of order unlocking. + * + * Therefore, we simply don't support such cases and keep the logic simple here. + */ +static __always_inline void release_held_lock_entry(void) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto dec; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +dec: + this_cpu_dec(rqspinlock_held_locks.cnt); + /* + * This helper is invoked when we unwind upon failing to acquire the + * lock. Unlike the unlock path which constitutes a release store after + * we clear the entry, we need to emit a write barrier here. Otherwise, + * we may have a situation as follows: + * + * for lock B + * release_held_lock_entry + * + * try_cmpxchg_acquire for lock A + * grab_held_lock_entry + * + * Since these are attempts for different locks, no sequentiality is + * guaranteed and reordering may occur such that dec, inc are done + * before entry is overwritten. This permits a remote lock holder of + * lock B to now observe it as being attempted on this CPU, and may lead + * to misdetection. + * + * In case of unlock, we will always do a release on the lock word after + * releasing the entry, ensuring that other CPUs cannot hold the lock + * (and make conclusions about deadlocks) until the entry has been + * cleared on the local CPU, preventing any anomalies. Reordering is + * still possible there, but a remote CPU cannot observe a lock in our + * table which it is already holding, since visibility entails our + * release store for the said lock has not retired. + * + * We don't have a problem if the dec and WRITE_ONCE above get reordered + * with each other, we either notice an empty NULL entry on top (if dec + * succeeds WRITE_ONCE), or a potentially stale entry which cannot be + * observed (if dec precedes WRITE_ONCE). + */ + smp_wmb(); +} #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index df7adec59cec..42e8a56534b6 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -30,6 +30,7 @@ * Include queued spinlock definitions and statistics code */ #include "qspinlock.h" +#include "rqspinlock.h" #include "qspinlock_stat.h" /* @@ -74,16 +75,146 @@ struct rqspinlock_timeout { u64 timeout_end; u64 duration; + u64 cur; u16 spin; }; #define RES_TIMEOUT_VAL 2 -static noinline int check_timeout(struct rqspinlock_timeout *ts) +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) +{ + if (!(atomic_read_acquire(&lock->val) & (mask))) + return true; + return false; +} + +static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int cnt = min(RES_NR_HELD, rqh->cnt); + + /* + * Return an error if we hold the lock we are attempting to acquire. + * We'll iterate over max 32 locks; no need to do is_lock_released. + */ + for (int i = 0; i < cnt - 1; i++) { + if (rqh->locks[i] == lock) + return -EDEADLK; + } + return 0; +} + +/* + * This focuses on the most common case of ABBA deadlocks (or ABBA involving + * more locks, which reduce to ABBA). This is not exhaustive, and we rely on + * timeouts as the final line of defense. + */ +static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int rqh_cnt = min(RES_NR_HELD, rqh->cnt); + void *remote_lock; + int cpu; + + /* + * Find the CPU holding the lock that we want to acquire. If there is a + * deadlock scenario, we will read a stable set on the remote CPU and + * find the target. This would be a constant time operation instead of + * O(NR_CPUS) if we could determine the owning CPU from a lock value, but + * that requires increasing the size of the lock word. + */ + for_each_possible_cpu(cpu) { + struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu); + int real_cnt = READ_ONCE(rqh_cpu->cnt); + int cnt = min(RES_NR_HELD, real_cnt); + + /* + * Let's ensure to break out of this loop if the lock is available for + * us to potentially acquire. + */ + if (is_lock_released(lock, mask, ts)) + return 0; + + /* + * Skip ourselves, and CPUs whose count is less than 2, as they need at + * least one held lock and one acquisition attempt (reflected as top + * most entry) to participate in an ABBA deadlock. + * + * If cnt is more than RES_NR_HELD, it means the current lock being + * acquired won't appear in the table, and other locks in the table are + * already held, so we can't determine ABBA. + */ + if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD) + continue; + + /* + * Obtain the entry at the top, this corresponds to the lock the + * remote CPU is attempting to acquire in a deadlock situation, + * and would be one of the locks we hold on the current CPU. + */ + remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]); + /* + * If it is NULL, we've raced and cannot determine a deadlock + * conclusively, skip this CPU. + */ + if (!remote_lock) + continue; + /* + * Find if the lock we're attempting to acquire is held by this CPU. + * Don't consider the topmost entry, as that must be the latest lock + * being held or acquired. For a deadlock, the target CPU must also + * attempt to acquire a lock we hold, so for this search only 'cnt - 1' + * entries are important. + */ + for (int i = 0; i < cnt - 1; i++) { + if (READ_ONCE(rqh_cpu->locks[i]) != lock) + continue; + /* + * We found our lock as held on the remote CPU. Is the + * acquisition attempt on the remote CPU for a lock held + * by us? If so, we have a deadlock situation, and need + * to recover. + */ + for (int i = 0; i < rqh_cnt - 1; i++) { + if (rqh->locks[i] == remote_lock) + return -EDEADLK; + } + /* + * Inconclusive; retry again later. + */ + return 0; + } + } + return 0; +} + +static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + int ret; + + ret = check_deadlock_AA(lock, mask, ts); + if (ret) + return ret; + ret = check_deadlock_ABBA(lock, mask, ts); + if (ret) + return ret; + + return 0; +} + +static noinline int check_timeout(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); + u64 prev = ts->cur; if (!ts->timeout_end) { + ts->cur = time; ts->timeout_end = time + ts->duration; return 0; } @@ -91,20 +222,30 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) if (time > ts->timeout_end) return -ETIMEDOUT; + /* + * A millisecond interval passed from last time? Trigger deadlock + * checks. + */ + if (prev + NSEC_PER_MSEC < time) { + ts->cur = time; + return check_deadlock(lock, mask, ts); + } + return 0; } -#define RES_CHECK_TIMEOUT(ts, ret) \ - ({ \ - if (!(ts).spin++) \ - (ret) = check_timeout(&(ts)); \ - (ret); \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout((lock), (mask), &(ts)); \ + (ret); \ }) /* * Initialize the 'duration' member with the chosen timeout. + * Set spin member to 0 to trigger AA/ABBA checks immediately. */ -#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; }) +#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 0; (ts).duration = _timeout; }) /* * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. @@ -192,6 +333,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, goto queue; } + /* + * Grab an entry in the held locks array, to enable deadlock detection. + */ + grab_held_lock_entry(lock); + /* * We're pending, wait for the owner to go away. * @@ -205,7 +351,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ if (val & _Q_LOCKED_MASK) { RES_RESET_TIMEOUT(ts); - smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); } if (ret) { @@ -220,7 +366,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ clear_pending(lock); lockevent_inc(rqspinlock_lock_timeout); - return ret; + goto err_release_entry; } /* @@ -238,6 +384,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ queue: lockevent_inc(lock_slowpath); + /* + * Grab deadlock detection entry for the queue path. + */ + grab_held_lock_entry(lock); + node = this_cpu_ptr(&qnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -257,9 +408,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, lockevent_inc(lock_no_node); RES_RESET_TIMEOUT(ts); while (!queued_spin_trylock(lock)) { - if (RES_CHECK_TIMEOUT(ts, ret)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) { lockevent_inc(rqspinlock_lock_timeout); - break; + goto err_release_node; } cpu_relax(); } @@ -350,7 +501,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ RES_RESET_TIMEOUT(ts); val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || - RES_CHECK_TIMEOUT(ts, ret)); + RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); waitq_timeout: if (ret) { @@ -375,7 +526,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); } lockevent_inc(rqspinlock_lock_timeout); - goto release; + goto err_release_node; } /* @@ -422,5 +573,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ __this_cpu_dec(qnodes[0].mcs.count); return ret; +err_release_node: + trace_contention_end(lock, ret); + __this_cpu_dec(qnodes[0].mcs.count); +err_release_entry: + release_held_lock_entry(); + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Thu Feb 6 10:54:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962818 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40E6222F39D; Thu, 6 Feb 2025 10:54:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839298; cv=none; b=HskPDnwhDvHeebgRwtbCT9/HRlQ8tVvfF7EL3h3cxHhvDvXrb3ouOCnWyMSJsWZ88mFg2zB0SzAF8peIQoI4anN/abZ+9GAeJ1x9C5RH6/4riDNqOCoXWQLu1We0bN6dAK7eZZlOKsy0Exal/Urx/RiqlL+4E7t/NLJ/bNA8Ons= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839298; c=relaxed/simple; bh=nWMxFCMQuXMGccLr9BZ8Vd2VRz48kOmVrJhLP51E7zU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rDexyDr/gBKcwjfVLpsOx1Bpv/Cz5l+0Ln1aNlL1yfrMThCJbrQuXcaMaRXJ89UR+Dtrhm996+3hazKC47l5mLwaSgvqmSdzIaJDaVGuZthNcq/FrwUGzS8BiRgWAeDXuJ7vzE/wIJSeogbu7tUu/vETlElw3Hoo3qq9kXCIYnk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lKvykQx7; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lKvykQx7" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-436326dcb1cso4874265e9.0; Thu, 06 Feb 2025 02:54:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839294; x=1739444094; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bppDtbpuPfpz/O7JpsRY6fQstRsmaQoxcIF1IN9rk84=; b=lKvykQx7hIDaQEaIawKB926FqTY6u6JFeoS/9InKxfBAWYofN3t9Ty/FmNFH8OqaFs 4zvdlaMKUKqDlHNlNNcuU4sAI5Nc7oxF06VP8b9Sj8nALnZf/N3pOyj2bFK8lf3E2/kz ylRgZ0bCSPLYLa6McMuRpgHC248d8A99HcVzUP8bT703cbnoSF0DNTq0/CkxHgpZLU/P w8IacNrhgFyUXdw6ExFcmHGpChKOpNJTtqOSMe4Jooz20Q6/Dlv8i90J4je/AxC70VCB EcB2donrKHWk+fb16gqQ9wjIFL3eeJsrk6ubkndgGwW3TeZGW43cCIupAXqUC3iJ9jWw n6Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839294; x=1739444094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bppDtbpuPfpz/O7JpsRY6fQstRsmaQoxcIF1IN9rk84=; b=cs3l2blPAyEl3kSmR+mQwgJIvZBP1wGaPDqGlZZbb+OOnwS18U4+avcLIFpYszK3nO A5HhyvHmAIK5GUIVq2vIOrP6yGay1hh9ZXuj3TaxZKAdL2fTnyIlU8mtfJOcUkB8aZyE LSEG/9xzB4OVQPC/vyU6ps87Ns/FblMULBKGNMINP76gOETELMV/VLaCwMjUEHA91s3L ZcoRbfvAFNW3oA/+RRx610oCA7BJhyQ+AM2dSew1mRczckSYNosLVqNNdshcF2gKhbL4 ozL0jB60D29dMlbdlBize0B6oqEAUuhFAwys5XIX8sb1BdVgOk+ndD8Z7YiozUtZoq2F ZQDw== X-Forwarded-Encrypted: i=1; AJvYcCX+/ZQvPthRiXoZW3P526VW+P6au0RNaNksp6EoX/QhsLvIHhvWeUgIKEiDcTK2Q2QN2eiZSS6nO9qxR9c=@vger.kernel.org X-Gm-Message-State: AOJu0YzIT3D+88q6s3qZLTTkwEZQMYtNDUP8CL3iLNKRsqiX5/Mr7WNR V7oejxWsG97XuglhfpHGYKBRPVvg5s3n807/sXevKSgqlKqy+6xuILqEDBXxhx8= X-Gm-Gg: ASbGnctqt8IA13/g38iI0GVNVfp8AwWnrUn5yAU0nRX41rYK2j6QoRaWCZYSmXhFbvu dVRJuXsrSmyu6w++tWrLZcy0A29Qozq0kBqHYfnS6r03wY15ZWmBqxzLIfZWqe0cGbHKQFqDSNe mu5KwpoL0q5k+mBHOfbwD57zUmK6owk2EFjz5DT0fWKniWkAEVIYuv1y1aDvdKMXOZNaappSp1l seT5Yqv+1JSM95oIkULe8Ib9zQBCHZDFR+QFp76VIjw0dK1JafFscK0R9q5Nz5Twdh9zc0+Y+PD 3Ihk X-Google-Smtp-Source: AGHT+IH4Mj74Q6s6ESR+HQuLAWKyeMTzag9eq/JQXW9Gs5KkYpw8nRAczk2OFAZMAasQewSLnwMzRQ== X-Received: by 2002:a05:600c:1c90:b0:434:f270:a513 with SMTP id 5b1f17b1804b1-4390d56e3admr51736725e9.29.1738839293937; Thu, 06 Feb 2025 02:54:53 -0800 (PST) Received: from localhost ([2a03:2880:31ff:3::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd36346sm1391806f8f.27.2025.02.06.02.54.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:53 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 12/26] rqspinlock: Add a test-and-set fallback Date: Thu, 6 Feb 2025 02:54:20 -0800 Message-ID: <20250206105435.2159977-13-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3427; h=from:subject; bh=nWMxFCMQuXMGccLr9BZ8Vd2VRz48kOmVrJhLP51E7zU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlMFpWa3g5n8jTl6bpUsPGHtx8MXyfjnnCKJt9 AHs008SJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyvWnD/ 9ElDHw+LJcUwAOu/LB12XiFVDPKFpTcnqBAIIgWpttUZIU9wtEERsQL0IYyZVGNxKf6NjzmCa5aWjB 6dG2OM5EHYXz75cwkHfgazZPzdvgoVeDK/X6Rt2/k5eFlhkvDpaZ1RJQQbPSvRy3B4iNrYvUxPRntH 0siGNJGxXXk2/BF8QAsO0/9VyB0myTjnkvBLTYazkhPxYeGYJPOVUfjk9cl5Z2kz+zKGi7S7EVov5k EoVNvIcbjJp6aOpxZ0Cl81l9juNQk5dH73X0Y6yLqqcmxRePAdscK6ygQ0v6j5b5bZg1vt9NqzHfbx bcepMH3rFrZmAqXnSuel7pnTFM4tuaOV9SyNfh2zbTXiyWyQFXxPuGFx1+OLztjitTNulOf0KE+BUh HpLSN8XCqNmM10ErdfqRD80ZDXesuP0vqDokTCpTWKQXbVlnLiWGEwIOnyoE0LWBwqn/XvqhOIFrv4 DYsmQzW+k7QJRAtKWsWYpWOHweuglwIcQeVSluJEnfGNkaIUyHbEU8X0V7jEnkdX5De3i1bERuidMP wM5fvNrUBt5JbXAuZjfP1QaTVtQYh+Rc0k6cDl6I0PUPAi5aAglDK3T8+HJtDTOjLvWN3st8/uhyl5 rLuvZDv3ZySigXnWcS+rrjCZ06WAgFBD5LZbLsv4etjiyzSaJO4h2bT6JiIA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Include a test-and-set fallback when queued spinlock support is not available. Introduce a rqspinlock type to act as a fallback when qspinlock support is absent. Include ifdef guards to ensure the slow path in this file is only compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add further logic to ensure fallback to the test-and-set implementation when queued spinlock support is unavailable on an architecture. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 17 +++++++++++++++ kernel/locking/rqspinlock.c | 37 ++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index c1dbd25287a1..92e53b2aafb9 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -12,11 +12,28 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS +#include +#endif + +struct rqspinlock { + union { + atomic_t val; + u32 locked; + }; +}; struct qspinlock; +#ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout); +#ifdef CONFIG_QUEUED_SPINLOCKS extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); +#endif /* * Default timeout for waiting loops is 0.5 seconds diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 42e8a56534b6..ea034e80f855 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -21,7 +21,9 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS #include +#endif #include #include #include @@ -29,8 +31,10 @@ /* * Include queued spinlock definitions and statistics code */ +#ifdef CONFIG_QUEUED_SPINLOCKS #include "qspinlock.h" #include "rqspinlock.h" +#endif #include "qspinlock_stat.h" /* @@ -252,6 +256,37 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask, */ #define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; }) +/* + * Provide a test-and-set fallback for cases when queued spin lock support is + * absent from the architecture. + */ +int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout) +{ + struct rqspinlock_timeout ts; + int val, ret = 0; + + RES_INIT_TIMEOUT(ts, timeout); + grab_held_lock_entry(lock); +retry: + val = atomic_read(&lock->val); + + if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) { + lockevent_inc(rqspinlock_lock_timeout); + goto out; + } + cpu_relax(); + goto retry; + } + + return 0; +out: + release_held_lock_entry(); + return ret; +} + +#ifdef CONFIG_QUEUED_SPINLOCKS + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -581,3 +616,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +#endif /* CONFIG_QUEUED_SPINLOCKS */ From patchwork Thu Feb 6 10:54:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962819 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4C4322F3BF; Thu, 6 Feb 2025 10:54:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839299; cv=none; b=pUtk56heu2d/nXAkL7Pug0/nybpo49raJVMNZeLpbQKcoP0QETVjQL5P9SJ4kTPz2pH2Jdbhp+wDnwAYSDdQ+QVXk8sHjtr3vB+1IT4aOn3ZL/PgzoB9qisSupQ18NWi9hCkq8AzFECwub5bjNTT+H8Hm39RXxQRPqRUWeJJCcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839299; c=relaxed/simple; bh=zXQSTFNS/c2U8KM3FgC7kOLmnF5sm9qLW6KNbuMrArs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L/plZx4nVXL/2eJ6PoG+dXUPDyusHId5hvhqn+0KbXBA3Hlo7LneZ0isKe0AMJDeAfEazTUU9oAy2dc1f3mtLD+yb7feJ525m/1eG+vd6iU7S4h2ACO4NkzpLXMH/EElPf27ImAyUPg5bRHLEU4kgzm4gGWomfdYuvNePphsx1E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RXwPRE7O; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RXwPRE7O" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-4363ae65100so8239135e9.0; Thu, 06 Feb 2025 02:54:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839295; x=1739444095; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2U/x1GfGP9j2cvu14gMbHcehw03jG1C+3eyWlyvy1zM=; b=RXwPRE7O6g8nlieebruXbtuU+4mpat4A4cQgaIb38tWx+9o6KIlBMjM2y4nR1Mayeo BG/VWyKVBivH8hVZ85bWKigoIBtwnfU1xJDLkF200Nqjie6pk15BcNCuCDOY++IeNAyQ qh+iXs99YEs05FPWHBG6yFx5P0rqVjJYXezxbwUjI/W7aOl2kcT0P4fTOGGYRLCVFENB bVKYr6PoRMvTNerw4L13fTI7CLerH56g4CkynWo8CBT1oi3sl/Qww3brRZia0xVd4Q61 EMMYUMLxZgtLI+g6dVnFT2xPHidjkYNxkvnbirPT4to6bHtAkHXwI91rktrpsf7Tv1qZ 0JUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839295; x=1739444095; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2U/x1GfGP9j2cvu14gMbHcehw03jG1C+3eyWlyvy1zM=; b=tXctsKSZbzAf1J2JDztORLrKB061n/lgRe9uzaNVlK41JYALO6A26PCnaKMkJhleAE mNBr1rQcWMMqpVkrSV9rdNLQhmp4Tdkeh8KuRgQ4M0/JUIsTaNIfjFBswt06ya+QIXNE jDp7pT38ymLtMr5joCY8cXn+m1XmwJFq+XbcCff+Pb+of091yPEIqWtYcJDvDq97fLsT Jkw4bzrPQts7fT8unB2bClIVg8CE1r3h9ZA2f3dFGEXlBoNwrKkG7vbSP12L0dDV+gl6 rGR9DW0jYaRxipiDpRRgP9ha1LE/SEC1caGl0EMf6x3sQciQQpIjEooAvStfzirMy/kf XuOw== X-Forwarded-Encrypted: i=1; AJvYcCUfUh6CVqQFaYeJhxopX4qpinFaH5QCf9pEdRCMcJLqiP5sWkBmJxi1R2R72SUXYLJ5adhjg47Rnol+VFw=@vger.kernel.org X-Gm-Message-State: AOJu0YwdRqLxbktviDnvzh8rlNwWKWuq/qWtzbmBuehRgwECQXBuXaEK ObH6dGSTXnoipOZECPJCW1bkJcnraF16Ozb+zqzUhSjqeiRmAULSjL+AXpy+XoQ= X-Gm-Gg: ASbGncuIgcmZYFkvuDwcB53Nj95oyRakiBtMxMHXIurVoVpc7gGeinGPmRFvB3lgvqQ tld5k0yLlfSjscyUTtx5TPCJ1jUi/ErOtrqYZWKllwFqXEhMqRfygQzKiLAqVsnxvxVhoe2mCeu HpIyAQIRmQZmGScbdmVSkRhOACuYe/+mbJz0i5B8JxE6qYO6P51blpBl3LMHcBP0Sq3vgEZZs7q KiQUQ5PB2SyUXOo0fwCokeQTqwgSoOeJCxUebvQhlmbO7tCc6w2SC/YSh9nlVSH5uV/aZutEEP7 tqMF X-Google-Smtp-Source: AGHT+IECKDGAzq+Depq2+nIptoAD3ksbrPRr2mBSOELhwPjJa2kOGcthn5Lo7ZSMn0/ZXamyQQJB4g== X-Received: by 2002:a05:600c:4f05:b0:434:f739:7cd9 with SMTP id 5b1f17b1804b1-4390d4350c4mr57549575e9.9.1738839295278; Thu, 06 Feb 2025 02:54:55 -0800 (PST) Received: from localhost ([2a03:2880:31ff:8::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd7f081sm1429621f8f.58.2025.02.06.02.54.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:54 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 13/26] rqspinlock: Add basic support for CONFIG_PARAVIRT Date: Thu, 6 Feb 2025 02:54:21 -0800 Message-ID: <20250206105435.2159977-14-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3266; h=from:subject; bh=zXQSTFNS/c2U8KM3FgC7kOLmnF5sm9qLW6KNbuMrArs=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRlcUneWOnutzdUSHTyEeSH9ShV4gvRxjc/jId8 1YMMoZSJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZQAKCRBM4MiGSL8RyojREA CAY3vlIDeTRuUpYBE+KFscZYPr9H8/q9UrVdKjYtlSmP4kUMnaYDdROM+/Z1eAE/ooLx/8vdb6tzBt fc3B+DxVL7tQO7jgkFhs8TWzKKlBa8uCE08YnUR183pRuI7FMMaaHbBakUz76eIN1rHwUUZnragSZK FI/5+wGEKE4O3AFSc0b6AVE2v7Ac5Qz5hBBsks/FANKm3Hdx0yJa1axdExsWkJREbmmLTO1RDjCoQZ MVoAxyPIBWrHsWFp3WUr3AGAwW/LT5cz1ADzDNURb97YC4QLdg4MG3NfjpdyWd2jj378mOBYDMydU7 ifjlmuV1KLLhJCbrDsvImh8rxY4aiH1tOMPpfC/pCCARozVsCCt/0NqyneZEa6ebt3Dn6xqnWrSM0D P8kjr4HmLnUenqsX6gCzKA1pntH7LoAXkWLdXVX0Ekt+LIx9dUbZSsloW9AwF+9r/JRRMOBYnNcUDq YDOdMDyfNFfvCjHd9qxbMa/52CU9Tl7upwFr/RbcEK4zY0ZgsV0kmffC3TAPnB+MzzJZwwtdF0ajVp nFsUX1lkdCe1xJTlMFPMzPgqFnxoN8cIGLIKW7BMNq3lr6eU9cBn3XZGlQqrCRwdk/yLylSriVacdL F0y6Ga5GbeLS7YaM3SQtcTwKK5K5ODeXnTU57iP8D7BcG5/WmFu88d6x4oxw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net We ripped out PV and virtualization related bits from rqspinlock in an earlier commit, however, a fair lock performs poorly within a virtual machine when the lock holder is preempted. As such, retain the virt_spin_lock fallback to test and set lock, but with timeout and deadlock detection. We can do this by simply depending on the resilient_tas_spin_lock implementation from the previous patch. We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that requires more involved algorithmic changes and introduces more complexity. It can be done when the need arises in the future. Signed-off-by: Kumar Kartikeya Dwivedi --- arch/x86/include/asm/rqspinlock.h | 29 +++++++++++++++++++++++++++++ include/asm-generic/rqspinlock.h | 14 ++++++++++++++ kernel/locking/rqspinlock.c | 3 +++ 3 files changed, 46 insertions(+) create mode 100644 arch/x86/include/asm/rqspinlock.h diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h new file mode 100644 index 000000000000..cbd65212c177 --- /dev/null +++ b/arch/x86/include/asm/rqspinlock.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RQSPINLOCK_H +#define _ASM_X86_RQSPINLOCK_H + +#include + +#ifdef CONFIG_PARAVIRT +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key); + +#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return static_branch_likely(&virt_spin_lock_key); +} + +struct qspinlock; +extern int resilient_tas_spin_lock(struct qspinlock *lock, u64 timeout); + +#define resilient_virt_spin_lock resilient_virt_spin_lock +static inline int resilient_virt_spin_lock(struct qspinlock *lock, u64 timeout) +{ + return resilient_tas_spin_lock(lock, timeout); +} + +#endif /* CONFIG_PARAVIRT */ + +#include + +#endif /* _ASM_X86_RQSPINLOCK_H */ diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 92e53b2aafb9..bbe049dcf70d 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout); extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, u64 timeout); #endif +#ifndef resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return false; +} +#endif + +#ifndef resilient_virt_spin_lock +static __always_inline int resilient_virt_spin_lock(struct qspinlock *lock, u64 timeout) +{ + return 0; +} +#endif + /* * Default timeout for waiting loops is 0.5 seconds */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index ea034e80f855..13d1759c9353 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -325,6 +325,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + if (resilient_virt_spin_lock_enabled()) + return resilient_virt_spin_lock(lock, timeout); + RES_INIT_TIMEOUT(ts, timeout); /* From patchwork Thu Feb 6 10:54:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962820 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A0CB22FDFB; Thu, 6 Feb 2025 10:54:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839301; cv=none; b=JfiLWHgtX07wp581g0xKzhBbHdQYRZJA6A4okXYgkl+bY9ojrbMvapei2EOy1UKYm8WeC8pfEQPkh98xTGsg/Kgo01MzmczDu64nOSK2XmQIqkkBbVwsFJPbOhlSyQ+XGjB+n58fcAn2tCnyNqa3plqtms6OKXIZUdfYEv42pPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839301; c=relaxed/simple; bh=KNh5avXGN/NKqBUPvLsxJm4kx32ENpJMvZqaG5TMnX8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vCFpnxRcEQ1g77VuAjMP4PdX5G+M0iHMnCgRQLPGvAoRoUB5NQyurupjBgmaSyq3C35Rhe2gs/D7/oMBfBV75BFw9fqvVZ+D/jqDzDXcfdqCgv1/d+b0GyvrwhlzHlzIc9loUxSPNZfAQ+qKI5rx4Z39YOKrfOzDCtD9240LNMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fBWdiLUu; arc=none smtp.client-ip=209.85.221.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fBWdiLUu" Received: by mail-wr1-f67.google.com with SMTP id ffacd0b85a97d-38633b5dbcfso681232f8f.2; Thu, 06 Feb 2025 02:54:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839297; x=1739444097; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=psIYVnoDnyqEhuF08gdBP5C/H1dDdxXA5if0bgYSupA=; b=fBWdiLUu7uh+kGgkufOXi21dQxwPUbb9iSZe6A5LSxxdv4jwL/o1us/UQztxsGba4A DyQfNkKwQQjVObIR7i6Nc95b94LPCYPTK8AO+WdQY9wG85Xv5jbm3lbzPR4oKUiTNS46 goZgfusrczIcDvIYmBY6JaH4FlBIKQdw8GvPv2pUSZJ61vN60u+djNy8knmz/gMEQsWX ENKEPtP0frn2dIKtGsSdG2VfjKgPl5p/vqH6L3PlfFxckEnBmcIzvbdBfO4/4UoXOW0k CTl6x+0d0pKkndvPb4pVnkZrBxhoEdI0Nshq8Tj/Hvuck5fh24IUfYW9PYTMivWX8d7m 2krA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839297; x=1739444097; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=psIYVnoDnyqEhuF08gdBP5C/H1dDdxXA5if0bgYSupA=; b=ksR1c4szVqWwDHohFWkWlsizHTvkblAt4x/s+KLheaWjht6gNb1MKD8Wgo+HWdOjAb CfK3wtcxkG74Eo4hQG9/XdtTo1a1828dC5rWyt1ubqWRLV/sjV/ybeyfb4r0mduL85Uu N0Y37gtbAkONTkpPP59+5pxb/9qpljvUxEwXaGywPJGQ1FRpIiDxzu5Ay7XnA+phrsD5 F5gJMpiu1ffYMqGewUEYHKNRpyypmN+RSj6S74KA9mjki8WOlzVFCJvUwQnV+vjEJFSB y06uOrD9TewwBbwUb4sDT2ABaHVaPSqjvasf0t9QvJMieuMejKieYrkja9lFJ5r+7SfT TyGw== X-Forwarded-Encrypted: i=1; AJvYcCWDWKwRMk7CsRYCP2tV0L+d2/ZAYFvvvDCB21sINubCUs+gjnOENsImuWoBseCL2uvhNa7uCoiF5H8kSUc=@vger.kernel.org X-Gm-Message-State: AOJu0YwR9hBHqeWCHmVWJcIN5tVHylgYpVE7OAxWCUvF6eY0KQ8STQ/x isuo0TI7A9XdmDv03Xv46/5o4YYvpReE81fsoiAjcEWS7cQpDm72uFjzk8DtwGw= X-Gm-Gg: ASbGnctR5QL4eTz88RbtkzPbhlVis3pCOvnAiD2MFVLqDbBEBfXY0xHIqaqkY9uGJ4w ypAvZB2prJVJ4hMBQe3qdd09NP1YDP3WMs7sORX7beSxd0qyQ9OU/6DO1GfCLchFlhBfmPooE3K 40K7O9T3IA4FvAuYCoiAgI62Dq8PD0FvG5hNvsmg6JV+TenIBon1uZBwKnukmI02yVIvKPl/ewK GFCnqBqfXErGYeEcZ+oWrZWb11OimEhOeFi4Wjacgj3p9Wy7d7t32aD5bXrfPIcbbYEkkf/lMBB 3mAW0A== X-Google-Smtp-Source: AGHT+IHkl/GEUHcQkAbl0ON/0qgQ/ymyQ9ZF0yYfkBQadmGYwFrDVox65cPqH2Wp5hAeoi4jk/bRwg== X-Received: by 2002:a5d:584f:0:b0:38d:b907:373a with SMTP id ffacd0b85a97d-38db9073983mr3633804f8f.18.1738839296552; Thu, 06 Feb 2025 02:54:56 -0800 (PST) Received: from localhost ([2a03:2880:31ff:14::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4391dfc8a4asm15277955e9.32.2025.02.06.02.54.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:56 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 14/26] rqspinlock: Add helper to print a splat on timeout or deadlock Date: Thu, 6 Feb 2025 02:54:22 -0800 Message-ID: <20250206105435.2159977-15-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2125; h=from:subject; bh=KNh5avXGN/NKqBUPvLsxJm4kx32ENpJMvZqaG5TMnX8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmRlXvwB1hx8++BclgtcXSkYsvJvj0epYaDAgR caKTzLKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryj/lEA CdyUwdOkuj69uW28N+HkwHcaKyNpxRWOwDuMUKnremt3PTIi1hrZRLDZ5EgKKnQdyKZY6t2nVrL373 lmuwFahN438dpl8Bk5tQbOhm6+4w+dE7fWr/DTpAuKuKYVZtr1/yiqqRdODy8wjoomvIzmS/in2A7C GF9o2x9dBdDcZ/k+DSqPjPQoebLQSloS3NlGu/AxxXu/YnaY7wvkbdiNj1GGZSKLX0Mf2zlShJbi46 v7mgNoevomx5zGXDhazsdU2klCu0ipaQUiKQlAbHB4LDaCfTJiw/FKqtjY4bmTlnVienNuoZwdcWqE BqCt7T9md4IJP+jH9aw6PAXZYlPYe/DGYUuOtCberxn54FV6sgZnEH03zTBFmJXKMxHisr9eXNr1zy /ciQUgfbnR0+tgIbFm8LWRpA1Wy+nerC5gB6EHuAIp0GJW/NIs85l39vDGshVBpbkh3T/NgrQOVYRc A+kNcL2FSKEXxRtiVw9nzH+exDCaJdx9Fb30gd31OENMeXf/QgUBlRKVdcyt9rD/H5L/01hPbSpwTq AxHFQVwdMLBxGzxGfpTUCDZHXq1BCR+fw5VT6cC3LSVeBC6Y7lEDJ0PPAdEPbxJe7DzjHFblZdRlsq EJfnrO5HAk3JyRiiqL+AbJN/On7peWMsf6ZNiz+g4rLFY2+PreoqERHpbkvw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Whenever a timeout and a deadlock occurs, we would want to print a message to the dmesg console, including the CPU where the event occurred, the list of locks in the held locks table, and the stack trace of the caller, which allows determining where exactly in the slow path the waiter timed out or detected a deadlock. Splats are limited to atmost one per-CPU during machine uptime, and a lock is acquired to ensure that no interleaving occurs when a concurrent set of CPUs conflict and enter a deadlock situation and start printing data. Later patches will use this to inspect return value of rqspinlock API and then report a violation if necessary. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 13d1759c9353..93f928bc4e9c 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -196,6 +196,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, return 0; } +static DEFINE_PER_CPU(int, report_nest_cnt); +static DEFINE_PER_CPU(bool, report_flag); +static arch_spinlock_t report_lock; + +static void rqspinlock_report_violation(const char *s, void *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (this_cpu_inc_return(report_nest_cnt) != 1) { + this_cpu_dec(report_nest_cnt); + return; + } + if (this_cpu_read(report_flag)) + goto end; + this_cpu_write(report_flag, true); + arch_spin_lock(&report_lock); + + pr_err("CPU %d: %s", smp_processor_id(), s); + pr_info("Held locks: %d\n", rqh->cnt + 1); + pr_info("Held lock[%2d] = 0x%px\n", 0, lock); + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++) + pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]); + dump_stack(); + + arch_spin_unlock(&report_lock); +end: + this_cpu_dec(report_nest_cnt); +} + static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Thu Feb 6 10:54:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962821 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 642E322E400; Thu, 6 Feb 2025 10:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839302; cv=none; b=KsJELJGzsZg3Lp94ZkZeEYC5WUVyY708HqLGRFNUwsSxbFhMEfSu0kyr+tyLcW8+gst9l84yhO/UXYZ86mqEiIUKTmrpOwsQw/SQfP61vyV1tTNf0pktq9W+gPochMt2MgAmaPtetnRrR+5yokFwP78qpaBo5Smn4175DVd4Eyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839302; c=relaxed/simple; bh=itMjJLB45N5MLJdgdgQDHOOS7VmFDUaHlV/BKezdr3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DUXxM2wY8tnrwxHdSbqNhzZtcs2euyIvOpfKUnlRF6zwaU5bggm0xhCRjTOjXKCva0cvG0VaUE/CHyenn1Ey5mzwIR1gc4UV9ZbPFRj4EBic6S73JDk32qvALnIt7eu08hRUZsOYWBDqd+2yKJubJqyBcMYzWPK3QBXfLAwgu1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MBk2H9hP; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MBk2H9hP" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-38da72cc47bso558595f8f.2; Thu, 06 Feb 2025 02:55:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839298; x=1739444098; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0X1+UYLXLNf68smwsb01lX6eFqtHiz/IPso40v3JOxM=; b=MBk2H9hPtikCu9S4CJ1dc393fliGxfmwrKgiBA16IPei5Iqxnc/k8lsCXyw70xc93O M4tb+HdP8o6evN9tGF7g4bIJaIZmKxzbyb+yId5roUDQyieuTN5NSZnzy/pxrWyLp7I/ JHHmOacm0xxME0fVbCDZBDfPL8G+6wn+ws6Xan1Q4/v5zm2Hwr3DX96WccDNchoA7969 lWAt7CMDctFRuKnPihjFe7F6VIi2wnkXHjD9vFIdfsglBT0RiMLuLQFup0X+CvYji55L 63MMy06QmDVzhQEBlqK8VxbTfRxl1vKwowtPS3n11+2iSBvoaHcrzhhpHXdLB1OGUZsN kgag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839298; x=1739444098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0X1+UYLXLNf68smwsb01lX6eFqtHiz/IPso40v3JOxM=; b=Iv+xWJJ8D+1dqxP2YIrgJLKFfVHaZHhYOXA/485ycE+LBDS9Zt7BMLTJf33GgZ3vmH KPoy0CjUT2uYm9PbDQqiukE8RVBMD/tCfLzT/NRBn/nU6EHcy3h5/LxDh+ic3our6UF7 4KJHTC3P8X4s0QzaPQHfLNjEsx5aysExytjL9G2naR2dVkI7nyAsDlpQ7S4/0UyWXfSc HZ5k68RZItEacn89rlDuzcRGTuqRSLiULMtli+H9XGk+qekd7Ejh7rLKoewX3HK7gtxH bghP1f21Aykn600tCyGFwa1jvMegpW/lcr2SZUpPxRe4QKL1kD+veMx0ip1VCEkV0aqU DsQg== X-Forwarded-Encrypted: i=1; AJvYcCXeXiAR0d4u1J7G2S29wFwLvp3t8Ltn3ht/apt1Ol+XSKhE1cK3/vxINJCqJ/Z5wv9AoC0+CYSUyMGw6Ps=@vger.kernel.org X-Gm-Message-State: AOJu0YxnlPL1KYMU3ooWMszTcwAxEJJxCd2w1aul9iYlqaBczNVj+9bn ki2X06075b4qV61pYWPEhsCEvCnyP0PXSggcNjjCMAPLeRnrj/1iULnckjNMeKU= X-Gm-Gg: ASbGncsQai+W4u/0CGlG7WhOLcFY5ctB8rwbymPC5WLCqIHfwVJSj86bNMFCgu1zkTn asttklAGomAp368am6Eshkiv2xonPR+4Sxd6dmva4e/2OFzcbxNjcKT55b4t9ut4bUX4ZDG4EF5 6Inm4oHfhHIpmBjzK5h5E+TE/SkUTndghg3eVU6rIiF09o0UQnzq3yT104vvri9Tshpt48faDFu TxbEFzfJibC2HEd7gT1SjXui1qc51YBv4jDHTsy5BGyOZ5UX3CYqLUvEtls9rplFqAMA2xKbqBv m7e1lg== X-Google-Smtp-Source: AGHT+IHFPsmZFGzMQgNIZxyin0+PS7DDeJ1kbh/KGqhqHLUU4fvdAQOI2sLLSlmyZkzlKcZIDN7WyQ== X-Received: by 2002:a5d:5222:0:b0:38d:b051:5a0e with SMTP id ffacd0b85a97d-38db495f8c3mr3602518f8f.49.1738839297924; Thu, 06 Feb 2025 02:54:57 -0800 (PST) Received: from localhost ([2a03:2880:31ff:16::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdd36776sm1432693f8f.32.2025.02.06.02.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:57 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 15/26] rqspinlock: Add macros for rqspinlock usage Date: Thu, 6 Feb 2025 02:54:23 -0800 Message-ID: <20250206105435.2159977-16-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3270; h=from:subject; bh=itMjJLB45N5MLJdgdgQDHOOS7VmFDUaHlV/BKezdr3U=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmSdisG+iOEzxKLgDdJjXZN/mncOFg8pv5dO3d bZNJg+GJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyqPRD/ 9gG2FU1oHAJPUn4UJ3fwKsuQm4i95ajoiPlT/m5kT+DPF9gSa+xO4hKuGYliwEJvf+nuxUe7FMhCUo nbsTx4KiQesdJvFLBEyC8lMRvIh0qaD//aym3Fhb4D5zAgaUBULNbyDh5gj/mVvu5FLRbI+DT3LJmw r1bostFusx1GdFieKe+yhrTChSjrfqKbnpY9R/8w5UkGN/g+vLsyykEbwnbgHhL0Ycvk+yqbXaob++ hWud3ejpfkcyQRqDBeycTn+q3cMN4T33wASR6VfBgNAggDQplTQ0NL97l+Dn3qdgtj1L8+PzWkyEy7 uzYnLPsU1re215zb4P38apAspYJP0lf30MOhhmObWuxKLqtviREsn0uOBZNsmdc4R3mfydqsDy2K3i fXrNShGCY4YJFjLqUeWDnW9g2IHVA0Y3GNq49DhGgN8ZGExFHkkLHKfWWt1vd80tJx+3FcaAcbrHN6 jqnPSeoZw9H/Gv0GIoXD05hjLkUhPcZYt3gM2eUbE0LjEPwU3A1PAq0320Z5j0ZxsuHnEQt80v5ElR 1N2IG2VJNeILB6c4gNBsyn59IW7xvMo0y6wnEpgTgAJHBxYZqffi0Js1mYt58TEIo6iaH05Sdjgvz5 U5PZ/7rUM15c0KeEcHBAwD6NKswh6LfL192Y9X7h+EChg4E+eQnBkpyXm7gw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce helper macros that wrap around the rqspinlock slow path and provide an interface analogous to the raw_spin_lock API. Note that in case of error conditions, preemption and IRQ disabling is automatically unrolled before returning the error back to the caller. Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback to the test-and-set implementation. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 71 ++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index bbe049dcf70d..46119fc768b8 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -134,4 +134,75 @@ static __always_inline void release_held_lock_entry(void) smp_wmb(); } +#ifdef CONFIG_QUEUED_SPINLOCKS + +/** + * res_spin_lock - acquire a queued spinlock + * @lock: Pointer to queued spinlock structure + */ +static __always_inline int res_spin_lock(rqspinlock_t *lock) +{ + int val = 0; + + if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) { + grab_held_lock_entry(lock); + return 0; + } + return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT); +} + +#else + +#define res_spin_lock(lock) resilient_tas_spin_lock(lock, RES_DEF_TIMEOUT) + +#endif /* CONFIG_QUEUED_SPINLOCKS */ + +static __always_inline void res_spin_unlock(rqspinlock_t *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto unlock; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +unlock: + this_cpu_dec(rqspinlock_held_locks.cnt); + /* + * Release barrier, ensures correct ordering. See release_held_lock_entry + * for details. Perform release store instead of queued_spin_unlock, + * since we use this function for test-and-set fallback as well. When we + * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword. + */ + smp_store_release(&lock->locked, 0); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; }) +#else +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; }) +#endif + +#define raw_res_spin_lock(lock) \ + ({ \ + int __ret; \ + preempt_disable(); \ + __ret = res_spin_lock(lock); \ + if (__ret) \ + preempt_enable(); \ + __ret; \ + }) + +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); }) + +#define raw_res_spin_lock_irqsave(lock, flags) \ + ({ \ + int __ret; \ + local_irq_save(flags); \ + __ret = raw_res_spin_lock(lock); \ + if (__ret) \ + local_irq_restore(flags); \ + __ret; \ + }) + +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); }) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ From patchwork Thu Feb 6 10:54:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962822 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66691230270; Thu, 6 Feb 2025 10:55:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839303; cv=none; b=kBY7wG1wrMnM9heoBEBrW8EwAUd8xShafnkGTCFSFv3c0zdqWs4ysUFx6IYa0HqwDu/bTVUtQ8L6HzsmtYEWiOrF0d35kwLOuRtVTB5dI00bPqEUZ91HnL+FDpXJyBx+SAQ+EJ1mOB1sx7Ngr0K/HMmgdhttd4gk0tlimmNXR2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839303; c=relaxed/simple; bh=lkeMxyDGnEWVT4+0Paf/jS31tLo0YsbXyTnTYAfMTz8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kAZ+NFgiQ4HPUFkUqVCcj49el2+uOli9pKgZmRa1w+0WG6nXVDazyKzkhMIesII5HFU48yRDghU6HQgg3yHRHaTgMbpjtl+BNWZhwGpAGV1gIMyEOTmZuQpLDZPwvX4Noy+3th8FkdZ6Ma1hHhAibD+8kKtSX+vBIZs2uJwj75g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AUwiQWGK; arc=none smtp.client-ip=209.85.221.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AUwiQWGK" Received: by mail-wr1-f65.google.com with SMTP id ffacd0b85a97d-38dbaae68a2so598307f8f.3; Thu, 06 Feb 2025 02:55:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839299; x=1739444099; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v/27xKgFdSZVO2eLH/O6emznfiz3pieZcJIn5kKU89M=; b=AUwiQWGKHYnXNO5BoV419wqq5G+Viauj5tWbYjWWnXKiDsndyh7M/CrPbWQf0a8abP x++4EnNktkQ3l/zvc4oqVOLpdSZsDZ2ByjlV+8G9yJ7Y1D0CQVuBXgjCTWxBVKntO+j4 mSpWEoZ9tmCUgw7NFVBZI29dP9xchkrhV7R/IX3EdXe2HOvHfBS8eHOKCOR3ZqqbQq9P HREWXRPOAGmAmNHl1cY95rYuFNljKFd8YZLBblVHtKhnQkgSsncoEH8mWZtGV0qtOJSr X2SWAEefwZCNya70k+Zihbxdvzoe/AhbOIOZ+RAkMz08WQ3QVBfFoqSDgx2AiwtDGI/D 3LwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839299; x=1739444099; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v/27xKgFdSZVO2eLH/O6emznfiz3pieZcJIn5kKU89M=; b=Hjw8G6LQ6Ut/ONWr37xZQejIrHcU2PCc9XQvZSYknmxpEFrRN8o4ybJqiHnZkmk9Y6 XgfjZQShwoaQoS/3C8XmJSncRXp3itGN8zZm0QcEIAVOtH0pULexky+TVZ728ZY1Uo5T VdUhbbz/BSHu/JHskmFzW2Gwe/Wd9kBgeX813Pkx0/ZfmxH4yoFafiXcsuvzViSMRWpq 6FBUxyOeR9L5uyN1R5/6lEm3iAL11EjGRk+6fSRbIYxE0FnBOrKWW98baNFskZgVmBQn GEHKFgwKgvkuylg/970iZE/Jx7emImEWpXJUBstGghvf0s0ysxGRz6UzDSnLGFgcrOpo Aeug== X-Forwarded-Encrypted: i=1; AJvYcCUIT0G6mH/dmX2TMq2lOpkO/UI3xe8ysBX3ar05hu6yBxCm5/dR4ZyoigR5/H+H7CPIRXNLBEEI7akVnhA=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0hRxakGQL3BG3donNbrbzrCJz+v2NRNAbSK8WNe/6ky/ZEZIw VYly6vJkExmMTuKUtrS2uiMrf0jrKr1o7YkxeuTzY8qnWFYkUienvZD6GoTGChQ= X-Gm-Gg: ASbGncsFHa7USPN7082XEKa1V5Xkn0/ut8LTd1Yp5A2aLxh6YtZDspEortmeS006KMf djykXZV6ka9xvfJM0h7mqz1db2sDX4e1HlXAlaEd8JuuAxhUV9NhFSNDJbHohero3lJ6EaWxv7Q KKdCq9VxavFMpanw4aVBH4bZRR78H7weJtL97IqHFTq1iTqfsUMrJMZ0TT041JEmMsx1Ya0Nmsl euNVmtx7A6X0XuYfkwbjJs+mV7lPuyaE3G+wbpevrwGgPS8rN0gT7hKKQg15chdC74ZXDb13hcl e0Oc X-Google-Smtp-Source: AGHT+IEaxBrKjdya80s5GTFvJ53ssNGQMjtcN2JFoVjdVKyNSB8TPhmiwUVzhaT6kIG4/mzrT4zfQg== X-Received: by 2002:adf:f9ce:0:b0:386:3835:9fec with SMTP id ffacd0b85a97d-38db492a155mr4449175f8f.44.1738839299377; Thu, 06 Feb 2025 02:54:59 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbdc30fbbsm1419486f8f.0.2025.02.06.02.54.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:54:58 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 16/26] rqspinlock: Add locktorture support Date: Thu, 6 Feb 2025 02:54:24 -0800 Message-ID: <20250206105435.2159977-17-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2978; h=from:subject; bh=lkeMxyDGnEWVT4+0Paf/jS31tLo0YsbXyTnTYAfMTz8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmHQ2YyiZ2nANvO6JQtZoHC62f1PKXDNjawAxK mKA8GLiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyjR0D/ 9UxpQJ303fRMoYR6fqKrESf9af9KoWduUW16ytQ8cl8LUmJvwPsQA2RBgUKdWqIjIitbNFRK0ahuL2 G3ZxiZ7lVlv6kVhyOprX9Rp9/1KXJb9AssTbXNvRGsblLw0VJWdbOxKA4CzRsqXHpyhAZw+crjWaAi e76bz8XD/UpVFVgPBoDM6XHvEU2CQz9W0Mujyrr8fIKId/Ly9v+BzlYwvJ5Dwiq0pKOjO1wJSs2GSe xJNqUgPJbtFFLsbyNGsWDMGgSm+PQL9V4lY/frjPS48Pxdu+urR1lOEEV933OLvUjCojgSyeFrQFi4 C7ca0HI9k7+q8frWCz+Je2IqiCXf7yXlCIM2WkxiNzINim/wLLzaPdOeueZFq4MdMjMZBhCcUB1Rbe mRdSVBLs7X7Zpm/3jue63s7r1/T5Zdd+FnKnX9Mx1pbvmw8d+u28kL4LxsLFp3fBS3zeo5fKA1b74I aPnMh8oLweieJCNzHBtJmoA+8bROXjIbWCktLpZ6wtoxPmkgKNC65kw8ylP9OD9mtdz/bYC6djzAq0 WjHL34MT7pIHjgbJe86Kbe4rqIKg2p+NIDSL06kM7XS11ju9ee0SrcJYHqw/rvOlMJqzWxf4Nv/Lcw tVGq7vh2z1O7XmMCOh+d4cORBxDlJ4pqO/LdJnm3CeCaaMqsWEXlmgSV6ZRA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce locktorture support for rqspinlock using the newly added macros as the first in-kernel user and consumer. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/locktorture.c | 51 ++++++++++++++++++++++++++++++++++++ kernel/locking/rqspinlock.c | 1 + 2 files changed, 52 insertions(+) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index cc33470f4de9..a055ff38d1f5 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -362,6 +362,56 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = { .name = "raw_spin_lock_irq" }; +#include +static rqspinlock_t rqspinlock; + +static int torture_raw_res_spin_write_lock(int tid __maybe_unused) +{ + raw_res_spin_lock(&rqspinlock); + return 0; +} + +static void torture_raw_res_spin_write_unlock(int tid __maybe_unused) +{ + raw_res_spin_unlock(&rqspinlock); +} + +static struct lock_torture_ops raw_res_spin_lock_ops = { + .writelock = torture_raw_res_spin_write_lock, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock" +}; + +static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused) +{ + unsigned long flags; + + raw_res_spin_lock_irqsave(&rqspinlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused) +{ + raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags); +} + +static struct lock_torture_ops raw_res_spin_lock_irq_ops = { + .writelock = torture_raw_res_spin_write_lock_irq, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock_irq, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock_irq" +}; + static DEFINE_RWLOCK(torture_rwlock); static int torture_rwlock_write_lock(int tid __maybe_unused) @@ -1168,6 +1218,7 @@ static int __init lock_torture_init(void) &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &raw_spin_lock_ops, &raw_spin_lock_irq_ops, + &raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops, &rw_lock_ops, &rw_lock_irq_ops, &mutex_lock_ops, &ww_mutex_lock_ops, diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 93f928bc4e9c..49b4f3c75a3e 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -86,6 +86,7 @@ struct rqspinlock_timeout { #define RES_TIMEOUT_VAL 2 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); +EXPORT_SYMBOL_GPL(rqspinlock_held_locks); static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Thu Feb 6 10:54:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962823 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C386223099D; Thu, 6 Feb 2025 10:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839304; cv=none; b=fYRcsZFPioB9fzlT31WzHx+A78oNeg15E5JOOCG1PW5Nzy1D8zKyyKGce9Nh15n37O+8Zfp+vW0Zj+ZOoolwUcdBWMSUjzJDUM7ianiO1+pdCNlz6RNF+kBiSgESbPINzVafNOLYQwPSOOdZUc/PYu/Qd4IQ0zO7eTOnnQ4CNy4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839304; c=relaxed/simple; bh=qYphScagVacMZI8xAzQRuNOSyn9/aS/2qYdw+lGxq8I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HNwOCAOzRGT1NqcFPZvC7drXZOXozIuzaS0kN+Cu8CuGP4ktinorw7CtmC4PA4VtbjYmwwdZmpPFQfnmI0WVXsEdnqPaiVd0RNIlZM0k/p+Sl1Dvk5cC8dVffb+7hDIHjVnL/ZTim/MGDqwFDisE1vXY3eK2oB3l0lhvVKI1G3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IJHyOU5e; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IJHyOU5e" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-4361e89b6daso4508905e9.3; Thu, 06 Feb 2025 02:55:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839301; x=1739444101; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qz+NS5T4GU+VbAH0BNt16FcBYOnl5PXsvLXnbwAi+Qc=; b=IJHyOU5exJy6tBg+2Iq1wln2wZiqd1j3p1J/XnqH8azqafHIpjOi8mGKmq6kZgQb17 swxHz+UsNucM3EZvyqR5b1O6t/Ew63d6N3Ai9+v+neUXt2MBDsdap3GAnWbOdQdMNcOC vSLDFiyzRt+jKKs2LXyrFvh5STUe39jOYLqoLyzjwCQ0y9w7cSsSdGp7pgUl18eRT3hS M+08d2pPGuZ2APCyE8C0HHDfii0QTEc+FkJ5vNhZScNy/r9/mQimaeyX1TQCZ5bKPVts QzCWB3Nt4QCkn+c1/zFtA3FcFgsnhOukr0okLYnpXJEMCb2jfQANfanQzplpdE5aObJD Gc/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839301; x=1739444101; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qz+NS5T4GU+VbAH0BNt16FcBYOnl5PXsvLXnbwAi+Qc=; b=uyG6HCG/tgZLrgVGAOBDKfvLBOB/WdVcCPwYhsytb0+q+J2y0hXPb6dXNZi9A+Lied Jt/1I54fB0jTgieL5PZW7ErUp7QaVLctqzIW1ap4qiLud3dXqJlDswwNt8k+jEWGn8jV XiDcPUEoZz/025n7qOtnkyo0yWQZRQK9NkuEsoQothahQ3WPFZccXedgEtW+jJvgJs0d 1oUVk1yGYfVn6z70RGscQQJG3s3SPH5LoZnuU+Or8vzMEXVqa2i9uAHlpcVWdFxbVSca 0/mqYAzFHihChPTRGxpV0fYXZYLo3Bm0F1zy5IsCqY4VOw7Q2ZwEyjZgAhSZdNv8+uJp Ly+g== X-Forwarded-Encrypted: i=1; AJvYcCVhtiI9jap7WMnlMVLzOtDWGgTr3aRDBnILcyw/GP4XygVtT7r/Yx9Yaxr9m8hLfpfJfyM7rH76h2I2GX8=@vger.kernel.org X-Gm-Message-State: AOJu0YxIBjLSpI3snGz0a9WDsEXjCRQ8ukn2tQkBjbjVc3vPgP7ku9rX LIhC57RD+5asvwj4YOdpmaGqPfkI5vissPl6h4OSxc7FBJn9x208/37uCtnbLJM= X-Gm-Gg: ASbGncshDurRLAC5rcB8uzK5CZfepiE5HpCTJ98PBSnMk6yl3bM0ZNeiLhNETmUBYTT fpPGuWzJiGG6OQNv6uTyLNXYRDOvMckVEa0Ge4V4ocVBrPdDhcWZaj6T9LGWoDXjh2/G3NWZVr9 DaaFE50HTSgQQwLFXa6HKvZHIUjObuO/Ljc/623y09MUBxPH6qeYwFLRrUc1iQviVTNPV77/BLE hXbV3O0MuEQrGmTmWV4YHBndPaFgTljovJTav2YDoMEZm6ayo5rBH0eHF/g9w1S52ApFSiQUyMH fbifww== X-Google-Smtp-Source: AGHT+IG3YUH0oYkY2WcU9nkb8nClyB5AjNsFI6WIkNWS0fGK3jGoBgR9RRDJA0dBgbN9twa+WCc61A== X-Received: by 2002:a05:600c:1c87:b0:434:f7e3:bfbd with SMTP id 5b1f17b1804b1-4390d5611fcmr49163655e9.23.1738839300786; Thu, 06 Feb 2025 02:55:00 -0800 (PST) Received: from localhost ([2a03:2880:31ff:25::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d94d7c7sm50627245e9.14.2025.02.06.02.55.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:00 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ankur Arora , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 17/26] rqspinlock: Hardcode cond_acquire loops to asm-generic implementation Date: Thu, 6 Feb 2025 02:54:25 -0800 Message-ID: <20250206105435.2159977-18-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3789; h=from:subject; bh=qYphScagVacMZI8xAzQRuNOSyn9/aS/2qYdw+lGxq8I=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmEQyiG868iJLdTJeV5rNF45XJkM3pzaj93Qpc DMdltmaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RyjhAD/ 94L7gwFOANgho2OHK/jFtxh/dQWudROmLlLNyaJh+qGxZ9/Y6D95iWxtze9d8A8JWUeVYuNcNzGuEN q2gb1bS2T8Pk+VCqcFRqsioDWHNnqM+BwLGq26+gTj0p3zop+cgAPklesV00m68IlSy1LhMcIzIF2E slsAxGCoT6w+U1KGge2lPNChfW8Kf4kMiHr5s4V3lqgjh5jDkHofe2E0f88YBI6NEK9X3fm/gk1V+J VIaptHRDqsUJmlPLmvdU51DFEwmGDVtxxc8mvgA6F08d9d87r+yOm0Oos298QtBiQcb6IqHR1aka9N ITML8aqrOeF2zgN99ZZVlzpqrJWuvSp/jTdyiXKWVRaHjVEtHs0pTh7ybdJy+0D0bEzNBE9I5MQHe+ WHend498nQc8gE/neaaMboKfxVnAocxfFtQoQknslS76Z48WT/MN4FdUGVjS9jgF8kAQB/iTqc+2A2 jb0b9ivFQZN1YJgTlFTR4zPHbYRw/NOF7HTisjbVAjun3ow+FP3VgcwZPkruEl+31v44pUzsmk7WZm PBqeRGaWYJACjXqHGBEueZu13cMlJCXDoarHrZqbMc5frds4B2dEW+0qLeFUe/LZJqdKgR+36oA8nE hBOileJ273cIjtBiGXGnEU1vWDHPgN8zSEOKxP4Z77q0bv5ucidRdFRkNFZA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Currently, for rqspinlock usage, the implementation of smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are susceptible to stalls on arm64, because they do not guarantee that the conditional expression will be repeatedly invoked if the address being loaded from is not written to by other CPUs. When support for event-streams is absent (which unblocks stuck WFE-based loops every ~100us), we may end up being stuck forever. This causes a problem for us, as we need to repeatedly invoke the RES_CHECK_TIMEOUT in the spin loop to break out when the timeout expires. Hardcode the implementation to the asm-generic version in rqspinlock.c until support for smp_cond_load_acquire_timewait [0] lands upstream. [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com Cc: Ankur Arora Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 41 ++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 49b4f3c75a3e..b4cceeecf29c 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -325,6 +325,41 @@ int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock, u64 timeout) */ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); +/* + * Hardcode smp_cond_load_acquire and atomic_cond_read_acquire implementations + * to the asm-generic implementation. In rqspinlock code, our conditional + * expression involves checking the value _and_ additionally a timeout. However, + * on arm64, the WFE-based implementation may never spin again if no stores + * occur to the locked byte in the lock word. As such, we may be stuck forever + * if event-stream based unblocking is not available on the platform for WFE + * spin loops (arch_timer_evtstrm_available). + * + * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this + * workaround. + * + * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com + */ +#define res_smp_cond_load_relaxed(ptr, cond_expr) ({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + for (;;) { \ + VAL = READ_ONCE(*__PTR); \ + if (cond_expr) \ + break; \ + cpu_relax(); \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define res_smp_cond_load_acquire(ptr, cond_expr) ({ \ + __unqual_scalar_typeof(*ptr) _val; \ + _val = res_smp_cond_load_relaxed(ptr, cond_expr); \ + smp_acquire__after_ctrl_dep(); \ + (typeof(*ptr))_val; \ +}) + +#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c)) + /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -419,7 +454,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, */ if (val & _Q_LOCKED_MASK) { RES_RESET_TIMEOUT(ts); - smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); } if (ret) { @@ -568,8 +603,8 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, * does not imply a full barrier. */ RES_RESET_TIMEOUT(ts); - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || - RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); + val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || + RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); waitq_timeout: if (ret) { From patchwork Thu Feb 6 10:54:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962824 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EB1A22F3BF; Thu, 6 Feb 2025 10:55:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839306; cv=none; b=U9JnDoQ7dL/F+og+a5z2kBpp3qwfyZvDHTvFKMpTC7Wkyin40LGAgGEh8aGdo6iUf+KnEdk/uRTV8/SZvaDgzMchZuyIDTDQ54yrCEPQCUvN+dtcI75J1lP8z79L62Ms3Hz7XNo6acdqL/G6fH1FYIZ+X5xqULexx6myl7GARrY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839306; c=relaxed/simple; bh=E+oadE0wJWqBDHpLclZmcj/fTawgs/RCGY18+MKVeJ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A0fWWk7692ZXe/hbv+6IH0qY0JNtODIdpiDe5INlTOA+FpI11aRnEd2Fra+5VJLub1Ww6ZDj8z4fL7UKNdOMl/5/5jJ7+LWLNNN6+5pRcpql3ayXdOgJUX2bxeKP0bLQcvYeNHETeo+8lwtpefBFAg/cVs2f1n1tgT4eF9DYELw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U/NCpswX; arc=none smtp.client-ip=209.85.221.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U/NCpswX" Received: by mail-wr1-f65.google.com with SMTP id ffacd0b85a97d-38dba1cc632so429644f8f.0; Thu, 06 Feb 2025 02:55:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839302; x=1739444102; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JSwd0Ic/zADHNPGibPzB3vfQZwdDu0cIcYcU2UTlBE0=; b=U/NCpswXPsupXzvY8PX2azXWLYEfA5Hp4IznwBrq1WDvH+qXR/KSpL3rH7hnVqnnhv t2wEB1H8vwcw44LZgJ+/hNKh8frM591lyWGoZvi95PwW8Zs2BBZCpVW3tDKPWIWVK/xc JryOAG2M/yv6Y8M8o3XbOrka6mOfaURw1ebBmqNWM0lSbZUEzE+XVr1J60wwklbuH4Pn LpziQ+OsMB1sOC04uO7eNQ99qWop5csvVS9iFxmuqKj92Q6ZKfeshpqOAem8pDKcVVIp XqHVrtZXqDga+iWWvKYSpXT0Xwfia6HunFObSx1SgEt6XTdOOVlJjaHZVdhRXPnqd8YY banQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839302; x=1739444102; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JSwd0Ic/zADHNPGibPzB3vfQZwdDu0cIcYcU2UTlBE0=; b=YVnWGDh5VEjC6ylHJLrAENv3390qp8F1iMTnQA18tEhGiQPTeKHEaXh8x81Jub9GrE 8JNd2fxxgT+XRyoybIjijXQrbx5pBJUDe5FRRYXXicOwJ/9WMZZz39hf7Xs2nkwd1GuR AAPhEwrFeIAu/lM+NcRkUA8QFM7MRJj9FXXLsrbu4hDyhJ2TTdbvivLop9CWzL6i3CgK mdrNF6x+9Iu2GvLfeE8S2vwrmOx4hZ/4yYyRobw8ScUKmirp7FdWuxojMsSoj8gbCNlP Mv+oMgL3buH1WmUH3klY7UdJCP8ZDZSU5kaJ7QE34hwt2BfFV1Y341UaL9P6vjI/1Nbg T5Yw== X-Forwarded-Encrypted: i=1; AJvYcCXlC8zuuURW5G5ETj5/3xcrpvZ85iU+i5/FnFm+v9jGPqfBgwsNQ6GDwbJX4zvYsJ7oq7Q9oZeWxB3e3lg=@vger.kernel.org X-Gm-Message-State: AOJu0Yym5VnowTW8+fmgaIrWMBgo+KSC6TliICgf4Siy4nP+mwRAplgu S1UZnYOfp46bOF6gPc7zhzcPnr7st4jEANr9t7ExpX1vr5exRoUHV91OWx47T8U= X-Gm-Gg: ASbGncvKIvD75OOXphGust36koGHdGRccGtDM/Wwcoeh4yRobUEfXUXeiHjqxv4Fth3 p3b2OZO/wuGRboL3fCgDkU80PJr84UVXzUm3FAnFZcOC5jK7ZRyq3BGxUghZcqcAWfIVXG6JipS NxXyMLgbR9xxHZddJB5UCErSnUfT6OZ1NpL2FfHCbcts0D53o70u7vT+x4LZni6gU4oVJZGIhFI KMolx1y+99oc6GC/ZIx9u8A+RmZeaK6Uo49bCEJoDCulgBJNVvctGSdlhYKhe9EgXktKos6B9up Hrel X-Google-Smtp-Source: AGHT+IHKdNt/2i2niit9x9Q/R1mWsETc05ExweAVXJWi6Fe7g9aArrjNh6K5B0b8g58nbAsrLJyEOg== X-Received: by 2002:a5d:59ac:0:b0:38d:bf6e:adca with SMTP id ffacd0b85a97d-38dbf6eae30mr869218f8f.48.1738839302123; Thu, 06 Feb 2025 02:55:02 -0800 (PST) Received: from localhost ([2a03:2880:31ff:2::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dc31b9394sm473848f8f.11.2025.02.06.02.55.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:01 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 18/26] rqspinlock: Add entry to Makefile, MAINTAINERS Date: Thu, 6 Feb 2025 02:54:26 -0800 Message-ID: <20250206105435.2159977-19-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2083; h=from:subject; bh=E+oadE0wJWqBDHpLclZmcj/fTawgs/RCGY18+MKVeJ0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmodQ4/VhA2GYFRoJhLQZknuv2U5hRh8VIgaN5 gjWRV/OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryq+SD/ 94tOt0FPzLcovxJ8PvSGMSImDYXVh0wToZsIW+lhBs3jfHW5GETrWPgvHrSUxGhML9no+RKLfirkp4 E1cPXorlRj2ki4xsWzLSlHk+EH2PWoH8jCjs6jMrtK+SMqCA7Oj4Ice6DQmaMIN6/hB/xu6chupbLc 1lXBQZPtFxk1eXA75g9LS8ZVQ3WXOnPj+t6OewmPFulHGjMCCRR9/1M8sw89gkNsM3aB5E5I7YGAoV DP5513oBCDAn8tJPZqTzddkWfZoi+q6onyMLx6WoRPWwd6fk/EKwhICv1Ikg+RI9MkK5qeIiUkhtsh FX/jfVeTi3znO3Ae+dZqpxdGdwDobONfyOditGEf38ImSjAUAeUfyJe4K8xYbvfWnk1WOMn8G3jHmE c9muj11DGHiIVfw/ynoQqr4v5vyuNYzu5FWY81NdA3Evh0nXgKo18riB8GKo1moM5Ga4Cc9rhmReue T30PWz9HdqaEAV4ligZ3OeXxNDjK8YNM2P3ccL8sziaBme2U8G5uVWyFh3wSuGZextoI1/7fUBWFY7 QuxUlV8b1+apMH6nrCBDVCO4LpBI+7ZQ9qDWn1MUzT7DTYAHwa19++ndXSIA4NCiycMOqEvocmJYpe V+70XrThPEC4JgdWBMXps/15vB9J+B+z1zka8ksy3fY9si4XlMK2GSNw1VjA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Ensure that rqspinlock is built when qspinlock support and BPF subsystem is enabled. Also, add the file under the BPF MAINTAINERS entry so that all patches changing code in the file end up Cc'ing bpf@vger and the maintainers/reviewers. Ensure that the rqspinlock code is only built when the BPF subsystem is compiled in. Depending on queued spinlock support, we may or may not end up building the queued spinlock slowpath, and instead fallback to the test-and-set implementation. Signed-off-by: Kumar Kartikeya Dwivedi --- MAINTAINERS | 3 +++ include/asm-generic/Kbuild | 1 + kernel/locking/Makefile | 1 + 3 files changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 896a307fa065..4d81f3303c79 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4305,6 +4305,9 @@ F: include/uapi/linux/filter.h F: kernel/bpf/ F: kernel/trace/bpf_trace.c F: lib/buildid.c +F: arch/*/include/asm/rqspinlock.h +F: include/asm-generic/rqspinlock.h +F: kernel/locking/rqspinlock.c F: lib/test_bpf.c F: net/bpf/ F: net/core/filter.c diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 1b43c3a77012..8675b7b4ad23 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -45,6 +45,7 @@ mandatory-y += pci.h mandatory-y += percpu.h mandatory-y += pgalloc.h mandatory-y += preempt.h +mandatory-y += rqspinlock.h mandatory-y += runtime-const.h mandatory-y += rwonce.h mandatory-y += sections.h diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 0db4093d17b8..5645e9029bc0 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_SMP) += spinlock.o obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o +obj-$(CONFIG_BPF_SYSCALL) += rqspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o From patchwork Thu Feb 6 10:54:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962825 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4D5231A21; Thu, 6 Feb 2025 10:55:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839307; cv=none; b=EZtXKKlI23AETpa5UmO96Z8BEDZlKhYA55oS+rZjEOVGH+AMelGXwaIavO/0Zu/3guF2W6vwJD8+7H7Um5+rbs4+FVx9q0MNtatXzDjsYPW5cm0vOLj3F0DajTCrHeBZGZmqkuy5ctl3PCj0i+7YUKqjLLUbFkXFJKPJNncyFnw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839307; c=relaxed/simple; bh=PXV2g5jkXA6M5CNQXPC4dpfD1jgCcit6rNK1L4ARIdA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RT6PxBLsIdE6Yg/y8nVvb4NMes9NcEob0oplxI4qUQuaBRfw3VjzgrUp/WrT3SVclNlfOnv5tnXW0OAR6LxLsY8suHfZjqRj+LOjJJS28Vi8PVXVxGm5OqOdUVL58n1zrjofuPHEqvN18D3OcQZLSiiCl0WT3oMAkvPU0efq0Ck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Kt9zc2jb; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Kt9zc2jb" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-436202dd730so4840235e9.2; Thu, 06 Feb 2025 02:55:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839303; x=1739444103; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pV8Sn3/Wzf2u35L7Uf0fdtRf3raefbZ3DCYUjKqcPwY=; b=Kt9zc2jb1gjx/QbQgZgfRSDsl9Vp/PWVXZ+Lw/Dqidrdf4iGnl3sYtnzlSA0TLiRuL t1/Xa2CNeUzNiMGhs+yphpu65S/0QYZKonJudr0pDN7yl0u4OSOe14tfJDsIXggrSRih PbzB2/B4oH/8LhbrIJe1lcyvI1OKKR9t8cZ5BM7OYwtRlQKt36RP/PnTmjvWrNqz4N7o Cme5RmSmG82mxKwmkMTQ6HQkmwcYMcRD65liMX1oG/f2UTr1YdkI45N30EcN2YlFvYuz Zs+B6jYDNoQQ+qqlWHgWiKCQk0NI/3ialqgBRobwnDhtjF6LzmSt27u3j2u/OwHxrSXv rZjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839303; x=1739444103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pV8Sn3/Wzf2u35L7Uf0fdtRf3raefbZ3DCYUjKqcPwY=; b=a0qiSyZqApKloUDdS8DTNOWX7HfkQl3qAl6lexytF2Sq9SyXOR9Lb50VguyCYPCTS/ GDLoSqWkvnlqYwoFbuqYe7GQZWcmLvZJJMWRIVeAWu6vSveX64v2IIJRr0svwxC2kNRX gBRhMyUlKcCcuad96MO2BZw1gpI6oxgkSTvh4mxXxeXH28pgHoJZlHu7EQ6+ccIBy3v9 08iFYZ/ovHH/JghzZhaqIajVRgFGYebuH9KmYxvaZh8lpxpfJwwIf2XKHkpzZviwiqG6 6lOIv+VGoRBigh95h7rYkQAMWGC0KSMrJ2wajwa8txfGSt/T+gCehcqNexeSGiTI+mZQ jBXA== X-Forwarded-Encrypted: i=1; AJvYcCU9QNE8eHKP7SI2ccG+aLXtWXdIjBQfIwXeiNCunOSBbLo7Mpdpe1OuxhjPpfNCE3UsTPzKEBhV4c3VQ18=@vger.kernel.org X-Gm-Message-State: AOJu0YyfTm+HzkY3KzHwDhazBAw359wWdbZjlMmGYHHHf0QmPVl7jyPj 68AIXrX6/rxfw9hIgVB/MUKASwrP0xxCppU/5mXPoCNUPUYKq6yfqaV6hICuLaU= X-Gm-Gg: ASbGnctyCtNi2yzBhRnNstCWUmVM80MTrKQ9gmrRDE0iSkcQXJ+aTQxtCVdXESOob/5 +v8rrzZlwxxlQj5+5/p3JTd5iaMJHey64BwZru2mdQEnGvpJxk+NWCZm5eRbf7WYqOqUNtApVkv LpQ6uDK3GS3K97Fs0FpRKgqhvili4rMgs7OTTGXj08fTKAKVGshi/SAs1eAKfGwr0fNALai3701 3Inuc+2y3XUAepFTDOlOkhDZ1AJr+uJcLm9OhhLnWoooQG45JlltJ5SVYY2WUA2dQCskqaaOVZj Rb+Qmw== X-Google-Smtp-Source: AGHT+IF1B76+1ynE7+9dQEBX3L/iUxR6t+WoOTyxAKlHsKCxtnZ617dC5+DYUZ8gZKTRoQgB1NYv6Q== X-Received: by 2002:a05:600c:310b:b0:434:eb86:aeca with SMTP id 5b1f17b1804b1-4390d43401bmr53989535e9.10.1738839303346; Thu, 06 Feb 2025 02:55:03 -0800 (PST) Received: from localhost ([2a03:2880:31ff:73::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4391da965a6sm15547985e9.6.2025.02.06.02.55.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:02 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 19/26] bpf: Convert hashtab.c to rqspinlock Date: Thu, 6 Feb 2025 02:54:27 -0800 Message-ID: <20250206105435.2159977-20-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=10973; h=from:subject; bh=PXV2g5jkXA6M5CNQXPC4dpfD1jgCcit6rNK1L4ARIdA=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRmPzcSKNTT0EqqnnkfrvBHcqC+gjliccdRPRLE gq5UTVWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8Ryk5KD/ 9ctjNzhAOgMhJSDZVpv3j4of82O3Xfa/GeQWEsteAu7YzSYq9AIIE3HFCAgpILNT3t5B88CX2Ed1lP x7dqTeC9Uk0p5aenpsjKIaVczLP6xpR/qmVo4I6h57OW1szo7kfpJPwNod+b01Lsjd/4XhfiP0XMTP G76JN02zA8AGTVzvJ8vGj1QdFNU9+tF03KJXh+Ai+Hy7bM/zZZnTDzBA7JGwE5Z3ySRB16mRXmSVLz CLGdo2Vx6qHCLVgZYsnC2f8JFgFXNNkkEcIDSP7o4OaZzrAd8KLs4rj30sgypf6XlKOaZP0cMAh97R rv9OgrXJGSWzBydhJ2Yo/U9qlaChNMWunq075WSaiww/hRbinOWydGQIl0BjHkhOsKVhBVaUhJT7mH 11W6qKAuXa0BEPoRunMVTqW01R4v/bj+YfT4PH+kIRjtYpePc7WvEnxYYr3HAhgiEiOzRNYV/nz7i1 6Qxz8MOJnYfqv5lWb482r1sfm0F+njVMg5corGbZpFCvMJMAbVl7JlJWn3KBt/rsn852KjsX0vk+KT Y4I+fnWdHm6XJ+zLmMdJjxipaCf42YtN3jaFQn/AfCmYRCm/oAaqCIpx1s2cTQi5Kpy0J3M4iGCVSm IfffEEuYwarej+1RmRfEc7KsRdw9g1C3XN/mK0+uQ97QaolvQ1fpjm1VZIXg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/hashtab.c | 102 ++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 70 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 4a9eeb7aef85..9b394e147967 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -16,6 +16,7 @@ #include "bpf_lru_list.h" #include "map_in_map.h" #include +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -78,7 +79,7 @@ */ struct bucket { struct hlist_nulls_head head; - raw_spinlock_t raw_lock; + rqspinlock_t raw_lock; }; #define HASHTAB_MAP_LOCK_COUNT 8 @@ -104,8 +105,6 @@ struct bpf_htab { u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; - struct lock_class_key lockdep_key; - int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT]; }; /* each htab element is struct htab_elem + key + value */ @@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab) for (i = 0; i < htab->n_buckets; i++) { INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); - raw_spin_lock_init(&htab->buckets[i].raw_lock); - lockdep_set_class(&htab->buckets[i].raw_lock, - &htab->lockdep_key); + raw_res_spin_lock_init(&htab->buckets[i].raw_lock); cond_resched(); } } -static inline int htab_lock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long *pflags) +static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags) { unsigned long flags; + int ret; - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - - preempt_disable(); - local_irq_save(flags); - if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); - return -EBUSY; - } - - raw_spin_lock(&b->raw_lock); + ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags); + if (ret) + return ret; *pflags = flags; - return 0; } -static inline void htab_unlock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long flags) +static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags) { - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - raw_spin_unlock(&b->raw_lock); - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); + raw_res_spin_unlock_irqrestore(&b->raw_lock, flags); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); @@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); struct bpf_htab *htab; - int err, i; + int err; htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE); if (!htab) return ERR_PTR(-ENOMEM); - lockdep_register_key(&htab->lockdep_key); - bpf_map_init_from_attr(&htab->map, attr); if (percpu_lru) { @@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (!htab->buckets) goto free_elem_count; - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) { - htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map, - sizeof(int), - sizeof(int), - GFP_USER); - if (!htab->map_locked[i]) - goto free_map_locked; - } - if (htab->map.map_flags & BPF_F_ZERO_SEED) htab->hashrnd = 0; else @@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) free_map_locked: if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_elem_count: bpf_map_free_elem_count(&htab->map); free_htab: - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); return ERR_PTR(err); } @@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) b = __select_bucket(htab, tgt_l->hash); head = &b->head; - ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return false; @@ -828,7 +794,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) break; } - htab_unlock_bucket(htab, b, tgt_l->hash, flags); + htab_unlock_bucket(b, flags); if (l == tgt_l) check_and_free_fields(htab, l); @@ -1147,7 +1113,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, */ } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1198,7 +1164,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, check_and_free_fields(htab, l_old); } } - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l_old) { if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); @@ -1207,7 +1173,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, } return 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1254,7 +1220,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value copy_map_value(&htab->map, l_new->key + round_up(map->key_size, 8), value); - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1275,7 +1241,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (ret) @@ -1312,7 +1278,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1337,7 +1303,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1378,7 +1344,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, return -ENOMEM; } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1402,7 +1368,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (l_new) { bpf_map_dec_elem_count(&htab->map); @@ -1444,7 +1410,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1454,7 +1420,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) free_htab_elem(htab, l); @@ -1480,7 +1446,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1491,7 +1457,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) htab_lru_push_free(htab, l); return ret; @@ -1558,7 +1524,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map) static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - int i; /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. * bpf_free_used_maps() is called after bpf prog is no longer executing. @@ -1583,9 +1548,6 @@ static void htab_map_free(struct bpf_map *map) bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); } @@ -1628,7 +1590,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &bflags); + ret = htab_lock_bucket(b, &bflags); if (ret) return ret; @@ -1665,7 +1627,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, hlist_nulls_del_rcu(&l->hash_node); out_unlock: - htab_unlock_bucket(htab, b, hash, bflags); + htab_unlock_bucket(b, bflags); if (l) { if (is_lru_map) @@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, head = &b->head; /* do not grab the lock unless need it (bucket_cnt > 0). */ if (locked) { - ret = htab_lock_bucket(htab, b, batch, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) { rcu_read_unlock(); bpf_enable_instrumentation(); @@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; @@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); kvfree(keys); @@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val += value_size; } - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); locked = false; while (node_to_free) { From patchwork Thu Feb 6 10:54:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962826 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBDA7231A3C; Thu, 6 Feb 2025 10:55:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839308; cv=none; b=OyKZEefAvECh27EI7NCjq+ftxqO+JP/Ab/yeLUwShflucokCfGPXps7WoRY429Mdax58pptouV32uQ8yYXpZNHVfil3ISZeWTz3ydIuZA1jplzgcE6cfv81gsQCUGSLdJyD5XAWOj11nPrCmcPzqd/YGucv4UFlDFNFCRCCuYxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839308; c=relaxed/simple; bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BylMcSDHmlxxoR7YTJfoFeYliayELxdo02kyNngfDHpwyNnwYGoMNFf+IdybzWi55pB/FYlnwUB5ub/Bcz41PZrj5+iYYKLCqyD1pPs7TKfka13dvn+cqOI1QzWH8Mr7jmYMCzSHA1Yh+6HVc9i4O3u033bgWjitHyofrrSqv5s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XvztQax2; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XvztQax2" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-38db34a5c5fso325938f8f.2; Thu, 06 Feb 2025 02:55:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839305; x=1739444105; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=; b=XvztQax2a0JYqPd6jFXQ6ErTuHnvFDB2Wo328WE/a0oeEePjDKW4az35LMENBbvvpX co0O+ikp6X3I/IizCeEsjZCUWo8FcgDkiyRrG6KzYbGHYT5Y0/hWVhnJ7VRy1KiKDspq hywlpm6CDz02UKIyB9+hOyB53I2v8PYfEQVBXS0wYnpvJOiWmRz3O9i6t3vuBtf9RwZI f2tC5LBKIF17r30rkjT34w3ZjVdpqLEMAXQDkDBA9Nujr5I7JUYSRcdBAd+CTL4Cbdtb bz8B1MNZz3GoPVgzLYXVykRvC2Hd9/HywI3f3fN1ql6aIPlWeBawQCbhoBcF+vGgfPw0 oHjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839305; x=1739444105; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/+/W0EApda7zGr5aRBIxeSAAJdBHD2PeVsO8K+dox+k=; b=LW/gMGAkypKZC59clkGS07iGMMgRWtPdFvtjjdiQutCK6jJ65O1WpYwFvDzehDdzqg AlLJKSIPunkqZL88fFW8OdAAq+TSzzyksUxXykUfIRjFfRmAfdmrEaU056lp3N9xu4n3 JOeO8Ku845uCa/PBbWS5UX9B3uNQpNY+slikl4XPQe3m5dvRXGJQTrTUwcUX+k4i8u+T DL9b5l4mSXvSkCIu8t4OwmdFY0ozQBF41NNEU1t8xUtaQME/QKGrOURzDAHV+HkzvBvN PFSF0BLjwMoil5Dl3YbKVfgOgBfffJKp6wRovunJ3DD7wwoJ1Ypht5RW+mL69x5eM1FS IRTw== X-Forwarded-Encrypted: i=1; AJvYcCVHP8xHPhUbFtituZkqfxoSS4nNwPRUdur1yqGeO6VfxgFTpcwRp2H6uLtTkufXNhfsw17NMoh+Z7peClk=@vger.kernel.org X-Gm-Message-State: AOJu0YzDaVGMJvLBfQP5wbQOtEbiM8gjL186xxpR0QttXkCI/tP+w9nl 05dJ5xF+YTKN5/20TF8pETPsWosA8XX9rYtPKEVK1+Ah568h7luoZ7Z7r4xZE8M= X-Gm-Gg: ASbGnctkD1ANRixv7LF1YQEosOibA4MNymnODlyscR84jLkervB4lM42a8FNU84462D 9bkiLOhUOljZzPbi4ERxkiM3G3o6eDnwlrm0ll5FbTFPiRiDAym0g7adKrIa5Kb1HDjARm8mgav +3EQAacLMnLDgUINxEDk9z3uO+O+4alNc8Tolg1zbgkTa8tbUtNeJWi3AgmCClF76oHUFy6AFNS 4MrkcDMugjeoQfJawc64H7IGh9PdzVH/Bu8KqALEU0iDzTKI1D+qQts4fNeoCeY/NhMCThuuxQE d8OY X-Google-Smtp-Source: AGHT+IE1fh4Z/RX0JKmKGEVYdBOLzgVhEKs2AnV1hSi1PhRaogcIsY0ipckf2p7b6ebAKkLpzIqwFg== X-Received: by 2002:a5d:6da3:0:b0:38c:5b52:3a5e with SMTP id ffacd0b85a97d-38db48577fdmr4311094f8f.8.1738839304658; Thu, 06 Feb 2025 02:55:04 -0800 (PST) Received: from localhost ([2a03:2880:31ff:2::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde0fc25sm1415577f8f.64.2025.02.06.02.55.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:04 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 20/26] bpf: Convert percpu_freelist.c to rqspinlock Date: Thu, 6 Feb 2025 02:54:28 -0800 Message-ID: <20250206105435.2159977-21-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6512; h=from:subject; bh=b7GJYiFRKmgt+Gua6MyRj5cuN5rmsZksWMg2VkyS3aw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRm4A+EBJCWxvZBf7f2Rwf9fNAzPFNCmT/crlsa NhTJVJuJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZgAKCRBM4MiGSL8RysBTD/ 9cLKt+PR4fSl5mi6p7a7E2k65BK0pYhLW8IoR7EvD1rs1MUM8TYi9Wd0P3u5ARJSKgUZRovio/UJsb MCLHj+03gAh32u7M8XtrbyRGGjWp81sskv3umm0S5W6qW7GNEzdDfhDCVgZGxTaPwghKEcP6GNkC5D 3LWP4b9pp2XSrw5PDT7EN54Ds1FfjGWg6awZXbJcWgVmmS4522IVKIgAgnotntrcI70ccUJoxtdyUD ADWxNVhu3snrVyFYlCSn80qYS6o0ZBYVjqh5K1pU/GnUahNHcT7iZbSHN3HH7/pZhLFbvphUDLtImg bi4WsQVeTVKTqXtm6o/FeA/7+P+pIhRSeOynMeOZT9EqTagpNVaaptRj5HZMuXJWY5qUIHnY+c8P0v o1VHWya9TBJyOJM37cFthbRx9BxyN7uDd3fsaqDwGs+p+NFzUAh5Yyx3n6dkBHQI+zFF2mryGLzXTR h/vu/u/DQSfC2693zDq/2G2Nq8GCT4nwzP2057XK4Fhewd7/9rFPbG2Azm6f192ZUVX2yNhrwmZcFt jjpQ4CaSn5LXmu4ocoOo3rDF0aGwvP5Sb/Jn9vS78QfBYpWYEJxBrVtRMyeKfPjo4jEinqb40wB4vk h5S6Uj1SdTQN/i3IBzw259gce+jCSWN1SQaz6Dgy7OdNHF2kxY9Ea5mBCF8A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert the percpu_freelist.c code to use rqspinlock, and remove the extralist fallback and trylock-based acquisitions to avoid deadlocks. Key thing to note is the retained while (true) loop to search through other CPUs when failing to push a node due to locking errors. This retains the behavior of the old code, where it would keep trying until it would be able to successfully push the node back into the freelist of a CPU. Technically, we should start iteration for this loop from raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus, we skip execution in the loop body instead. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/percpu_freelist.c | 113 ++++++++--------------------------- kernel/bpf/percpu_freelist.h | 4 +- 2 files changed, 27 insertions(+), 90 deletions(-) diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c index 034cf87b54e9..632762b57299 100644 --- a/kernel/bpf/percpu_freelist.c +++ b/kernel/bpf/percpu_freelist.c @@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s) for_each_possible_cpu(cpu) { struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu); - raw_spin_lock_init(&head->lock); + raw_res_spin_lock_init(&head->lock); head->first = NULL; } - raw_spin_lock_init(&s->extralist.lock); - s->extralist.first = NULL; return 0; } @@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head, WRITE_ONCE(head->first, node); } -static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head, +static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head, struct pcpu_freelist_node *node) { - raw_spin_lock(&head->lock); - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); -} - -static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (!raw_spin_trylock(&s->extralist.lock)) + if (raw_res_spin_lock(&head->lock)) return false; - - pcpu_freelist_push_node(&s->extralist, node); - raw_spin_unlock(&s->extralist.lock); + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return true; } -static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) +void __pcpu_freelist_push(struct pcpu_freelist *s, + struct pcpu_freelist_node *node) { - int cpu, orig_cpu; + struct pcpu_freelist_head *head; + int cpu; - orig_cpu = raw_smp_processor_id(); - while (1) { - for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) { - struct pcpu_freelist_head *head; + if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node)) + return; + while (true) { + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { + if (cpu == raw_smp_processor_id()) + continue; head = per_cpu_ptr(s->freelist, cpu); - if (raw_spin_trylock(&head->lock)) { - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); - return; - } - } - - /* cannot lock any per cpu lock, try extralist */ - if (pcpu_freelist_try_push_extra(s, node)) + if (raw_res_spin_lock(&head->lock)) + continue; + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return; + } } } -void __pcpu_freelist_push(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (in_nmi()) - ___pcpu_freelist_push_nmi(s, node); - else - ___pcpu_freelist_push(this_cpu_ptr(s->freelist), node); -} - void pcpu_freelist_push(struct pcpu_freelist *s, struct pcpu_freelist_node *node) { @@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size, static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s) { + struct pcpu_freelist_node *node = NULL; struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; int cpu; for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { head = per_cpu_ptr(s->freelist, cpu); if (!READ_ONCE(head->first)) continue; - raw_spin_lock(&head->lock); + if (raw_res_spin_lock(&head->lock)) + continue; node = head->first; if (node) { WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); return node; } - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); } - - /* per cpu lists are all empty, try extralist */ - if (!READ_ONCE(s->extralist.first)) - return NULL; - raw_spin_lock(&s->extralist.lock); - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); - return node; -} - -static struct pcpu_freelist_node * -___pcpu_freelist_pop_nmi(struct pcpu_freelist *s) -{ - struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; - int cpu; - - for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { - head = per_cpu_ptr(s->freelist, cpu); - if (!READ_ONCE(head->first)) - continue; - if (raw_spin_trylock(&head->lock)) { - node = head->first; - if (node) { - WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); - return node; - } - raw_spin_unlock(&head->lock); - } - } - - /* cannot pop from per cpu lists, try extralist */ - if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock)) - return NULL; - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); return node; } struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s) { - if (in_nmi()) - return ___pcpu_freelist_pop_nmi(s); return ___pcpu_freelist_pop(s); } diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h index 3c76553cfe57..914798b74967 100644 --- a/kernel/bpf/percpu_freelist.h +++ b/kernel/bpf/percpu_freelist.h @@ -5,15 +5,15 @@ #define __PERCPU_FREELIST_H__ #include #include +#include struct pcpu_freelist_head { struct pcpu_freelist_node *first; - raw_spinlock_t lock; + rqspinlock_t lock; }; struct pcpu_freelist { struct pcpu_freelist_head __percpu *freelist; - struct pcpu_freelist_head extralist; }; struct pcpu_freelist_node { From patchwork Thu Feb 6 10:54:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962827 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02D59231A54; Thu, 6 Feb 2025 10:55:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839309; cv=none; b=BJ2YSCkjTYimbZM82Kw6kQx32KwAwggMFTqbRIpyoLe2QsemGZtLdaF6oDl5UlIYIIimHbKNhFbAKZRjUFpp8I/4mvc4dvQ/C9bv2QDI+U4BEAspKRhSPJLP8sMDQyxZnPaeJainzxcPVTOCcIlXB3POxQseF7EPg8SArpUckVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839309; c=relaxed/simple; bh=7psEOc+yd4qPNsI4KRxI79thdXuYqSsZXOHrdwDSR0g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BWYlkXFN8nLAJdeWGMoybQX38Sz+AM1r5H0pzHPgyph6xZncZGtuoXKJcviwMZ62SxbpBlDBUd2TkvhbID9DNZoKU8isLrArHcHbzYtXZ4F5sriBB8RiY9hqyOlyNFjQ9iIqoS6P1C9NAw92SHERHzukOcVhSIedwZLSkLpPNXM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=P3TygDOt; arc=none smtp.client-ip=209.85.221.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P3TygDOt" Received: by mail-wr1-f67.google.com with SMTP id ffacd0b85a97d-38db0c06e96so491207f8f.2; Thu, 06 Feb 2025 02:55:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839306; x=1739444106; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eNqgh98W4funKculi1nvK5lPwgfxfi44KHpp2hUPVk4=; b=P3TygDOtWfUL8j5tsxvtl7s58UFN6Pjol6oGkHwCYFdx0S357gmVQCsm0P2T4pyxrD 5R5wa6PueMNQ/gTZWqwp1P/gXF4Fy9yqzDcshWLhLJyjB5Qj2D2tiJQGu2TOURwOcmIP FaKEtIIq7NYVl5MLlLnnzRskR9Ov/0qjEWHmJ8HTjUW6yArP5s05hNlwqqC/kMCzVB+Z o3raM75FjZeCt8E9YdxM/oB3OjHrqM0sKLDtsn2ErhzrxyV8DLRK8PhESP6rM5Tkz3Dy 0RgwtMm/O8lOomwGTw9aP2TSLVRspPZmsB+gde95ZfumbxVQs+ZeyYZ6sNP+Rgtf//m2 YaOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839306; x=1739444106; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eNqgh98W4funKculi1nvK5lPwgfxfi44KHpp2hUPVk4=; b=IZ0MSTiDfITcqBaQqSXn0srgVCxwcpGHFy7Q0ja0nzkcJ1efX4zJgGNehUVTRo9bKY FKujO4+5Me/yUKpB5ynfSjWBYLNMuPhwRObWhZosE1aPf2I0nOkyKWMDVb+sI2HAzDHg G9CFpHp3X4h6DXw6zaPTP/gPTSRiJ3WV75LfuvimdzcPEr67hFwn24A0+RO4SbCR4DU7 Ruvn7EEjV2KeT3ZcAAsn4rfBybGb5tsFHtzQ7EzQhy3sAfPrgtLIIjOeXBcTnSNkC/7i 5hzIdeIWg4jCl0p/fNCZdTYx5eC4Lgco6CutAAkm47TWrzZIRjpJm4g8Z69GmWcgpkTB SnZQ== X-Forwarded-Encrypted: i=1; AJvYcCWvVuEw3qLWR9FMHLEwXDECrVHIBcsn3JaL4LI4PJifRD6KR9WvgTenJixPttXc7FGt+Kez0DwescRrsSo=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5nH0MJAkQ+lH6efWgyX3pJdztXY5Mi6ggm7X0McRLsSXS2+ou k9rDuebjz03sSGAROCnV9XRWP4yJChhlS541GtJ2dp9GzXFvz3FRWExeMNjjBK0= X-Gm-Gg: ASbGncuwylITTc23oQ5zcxR/FYLu2K89kraO9U4hHXCj5TDfD0HKsz2pJaOx3JkO8Fu 8gZfFb6Oq6x1GnRfaeeSPYXJdYXpwJ7WIaoXncenvLFVH3PJJ+5aDbE/g25UjDJaNXtDLxBnK16 GXn2s+S5Q3/SSAqC+MgQIyWjH1oYvHXq43nnGzb5VtoiFjZPZ3Rkzdsx4cTmnY43NtQ4tbiJI6l VuD+EFUov5GaDT1C4jwMmsNws7xbNnDLvDvn3Nd3omjVQhzXDPlt92QtVzEaVmwqsM+IcZ6jgOY Zbv20w== X-Google-Smtp-Source: AGHT+IGl6SmZaMuRp7oUK50tAaXGFTAXDbbBuYWF+LV4jkLjehRBR+yIALlWuxWr7fqUlT9fVpNuwQ== X-Received: by 2002:a05:6000:154a:b0:38d:b125:3783 with SMTP id ffacd0b85a97d-38db4869738mr5279252f8f.18.1738839305846; Thu, 06 Feb 2025 02:55:05 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1e::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d96530fsm50029725e9.19.2025.02.06.02.55.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:05 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 21/26] bpf: Convert lpm_trie.c to rqspinlock Date: Thu, 6 Feb 2025 02:54:29 -0800 Message-ID: <20250206105435.2159977-22-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3797; h=from:subject; bh=7psEOc+yd4qPNsI4KRxI79thdXuYqSsZXOHrdwDSR0g=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnw6DiRxO330wCUMflYYf1S6rgXtusJs7sBci+ PMosJk6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8Rypr6D/ 0XU5AM+X7xrBbdXvcrdW6JR+XH2p7xDA8af4vJJme6W9rY9nqzM+aeJbS2fsqkaBOjvGhsrm6KSEvU h5nXKuyn5/iiNQyeA2eBNyNmynnhzv6V3d0A7NJwIL2gaLY06/KM6un/G5Zg7met6nyIaESxte16Zt Tj5PkEGuw2YWZIcBVnQHV1DCr9Cc2YERSgY31ol47Dox5OeIfV2Y+ur+/UmtHmdvZ5+g/plLlYxItA 3Mzw1TkFtwsYfMPR4b35Lo1IhYbORKAi5irpZ7F2x9vPeQ2b3wSFo+UjysbmMLgCp4k+cpHnPU+p16 ETzQsMxtY4lYN51QY3p+/yat8sRLixm842KUr+YDb385pgKtUe5qoiH+F47vXvN87HX4t1ptQ2/H7R gu1PEJN4dsT8I3PQzCGdlQoj17l0rZMaNn3acsxv41JTaPrIZgwx7fpOoB8x903xvDrzFJXqGVrwsG ibywFOJqfclF2qvVUO3ygHfXLFXMLus0EHad+VZzDPc0+wy53OQPcURCkt5UiaTENbTuGscfNR9GHT PtQksRTWgfDQgzrDx4EEkFvZ8uKGPHwtH3SAxFXLdTMysqVjm14CT2if0p9yCYswzH/yX3dA7ImI7q xLGCvaXy82YcKX7auIkK8zCMUWSP5gJB30nAtkG1jSBLXHv9M0FKHlbziHLA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert all LPM trie usage of raw_spinlock to rqspinlock. Note that rcu_dereference_protected in trie_delete_elem is switched over to plain rcu_dereference, the RCU read lock should be held from BPF program side or eBPF syscall path, and the trie->lock is just acquired before the dereference. It is not clear the reason the protected variant was used from the commit history, but the above reasoning makes sense so switch over. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/lpm_trie.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index e8a772e64324..be66d7e520e0 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -15,6 +15,7 @@ #include #include #include +#include #include /* Intermediate node */ @@ -36,7 +37,7 @@ struct lpm_trie { size_t n_entries; size_t max_prefixlen; size_t data_size; - raw_spinlock_t lock; + rqspinlock_t lock; }; /* This trie implements a longest prefix match algorithm that can be used to @@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map, if (!new_node) return -ENOMEM; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + goto out_free; new_node->prefixlen = key->prefixlen; RCU_INIT_POINTER(new_node->child[0], NULL); @@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map, */ slot = &trie->root; - while ((node = rcu_dereference_protected(*slot, - lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*slot))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map, rcu_assign_pointer(*slot, im_node); out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); - + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); +out_free: if (ret) bpf_mem_cache_free(&trie->ma, new_node); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) if (key->prefixlen > trie->max_prefixlen) return -EINVAL; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + return ret; /* Walk the tree looking for an exact key/length match and keeping * track of the path we traverse. We will need to know the node @@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) trim = &trie->root; trim2 = trim; parent = NULL; - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*trim))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) free_node = node; out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); bpf_mem_cache_free_rcu(&trie->ma, free_parent); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr) offsetof(struct bpf_lpm_trie_key_u8, data); trie->max_prefixlen = trie->data_size * 8; - raw_spin_lock_init(&trie->lock); + raw_res_spin_lock_init(&trie->lock); /* Allocate intermediate and leaf nodes from the same allocator */ leaf_size = sizeof(struct lpm_trie_node) + trie->data_size + From patchwork Thu Feb 6 10:54:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962828 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B645231CA4; Thu, 6 Feb 2025 10:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839311; cv=none; b=YvpHI5v6klqUqvcOb4D8TKLm+9Y/5LeC0dW3EJc8GjcGgV/iCObwXogBorvTa5i3mZq7Yilw9df9uWeZJ/TkD/NnpBn3C5BT6ZzH2Ts1sYxg+HIlg/k5a11+77+nAZ3VOypqV7Bh4N7l8zGVakyFSiUgRc1x0R091jX/f3PaPcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839311; c=relaxed/simple; bh=U/xq8I6EBeEZJrIOy3nlwnF6weOW6zwHeOpg6R+a+Dg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VeD/3r6MhnLE1QpOnR4vZXRCYEItul5H22zy5FW2lvYhmmG/Kdk0Jmy3SsC3oNZqncWLbUEYPlCruSYIrGxAcOQjextzoztJpy0hYtXFr80saUvTJktVx7k0hJmsXnRCWY8OL+yqypnLHPHdv38f15jFaerjynpItgwdBR6JPU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iIFuvR+D; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iIFuvR+D" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-38daf156e97so381910f8f.0; Thu, 06 Feb 2025 02:55:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839307; x=1739444107; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WY5xY00eKUhPVaK5CMP9UKPxhBmvTRnKdfB0Ehrzj4c=; b=iIFuvR+DEP18HjqK9dm7BqCGZxLhHCCmi3eZaAvdDnjQ3Mx1vzBOyv47zjXrtIMtBK a116vPC1kzm73eO0Z/65rCC420wDEJ4WqDurT/pCAycacoBs75Lmr4pyWrbuN9GLhPl+ nJdUUG7lqIsuRnk4MnkbBWkVxm73WSGdyc0Lnnn5I3OF1l7JrMO+fJeeHjx45XlRPWud Ztys+6itWc8RnKqhdHqeG5NuHZWqqJp3SH8H/7rJRx1BNL5F2plwhmsN/Rh7rRMa1GWu S70YTAxOZPnnovhnK9JxL3rJzSxvUkuSpRFJfZ1mdH6WpeeCTYJTnVdtzXJ6/T7FPeeH AIlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839307; x=1739444107; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WY5xY00eKUhPVaK5CMP9UKPxhBmvTRnKdfB0Ehrzj4c=; b=DvD/4+pHalowZivgcnJqeLAL+ioyqLR48V4folV59HDMN02i9WfD5Ii+zfV6V6Db1l V+EVqK+tx8Qi+wsyDVWYrzkyW/OA8lA1D1IT+8QODYK54kEfp5sOGTDODXMw05qX7nQA qgZc3/utC+WCfPsSLyq1poqNbYJKSvJ9/tUw5mu/mOELO454WFLG0D6uxtA2J6TbWmZX IS437l3nhX23v6xUyJ2BfWXYoVjhRlZXhAu42E1lJDARIMAIs0lwyX0dX2Y7Ry9bgKXO 2iBOrKm1qlgbi+pTSG6M3kl+clEYcZc1Eerv/8gesAxq+cKgQpnFVUTl3IS9bLBhk865 f4NQ== X-Forwarded-Encrypted: i=1; AJvYcCWJM5A818OAv5i9Pgp5vh7ZpGSG8knkQywXoTNpHE6p4ENdaiB1Ky50DujaqZDfEtzpr5Xgn8HKQ2gcHKA=@vger.kernel.org X-Gm-Message-State: AOJu0YxC8fEoR2FJhseytPFGJ41HQxjNMk3C6BArs/8KdWLRFdcNmX3l jmhhnlnobX+fpbzwJ8XEWyb3TyjyxhVNbP4uhnC+Hstw94g7npEpGcnTx9filR0= X-Gm-Gg: ASbGnctWwVrB4fvOviLNFJVUgX4VqWn6qG6tyuOayNo951DWc5ldmSvgjWkxOJ2jyUh gQOuKIeFkWFO1Gqggl6OlfFZ8/YsBCOtPvsRiOC7PIRnsojSU7jA6s69SZkcuEXZOfhAWbRYsxA oC03Zm4ftVV6yeKqlmmcuy07S3CO/HoysYi2hiRtMn/dco9J1o+ud6w/5sorg3xD/EMDMVbx5TY GwWiVCQ8bZ3tCbz38P9rrvjbuCcxUj+ZOV0AHXgjzJHuqBW4YjkmE75o3/0LA3VeN5kX6EGXvYZ 6nTJ X-Google-Smtp-Source: AGHT+IG9zXGoXyJKKHaq7JsfFOdulH8qStWXMMx7WVjIkbMNF6mgU4TITTZXCi7em7sIZ7mcLjLv5Q== X-Received: by 2002:a05:6000:1567:b0:38a:5ce8:df51 with SMTP id ffacd0b85a97d-38db4857bb6mr4346951f8f.2.1738839307068; Thu, 06 Feb 2025 02:55:07 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde1ddaesm1381571f8f.85.2025.02.06.02.55.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:06 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 22/26] bpf: Introduce rqspinlock kfuncs Date: Thu, 6 Feb 2025 02:54:30 -0800 Message-ID: <20250206105435.2159977-23-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5071; h=from:subject; bh=U/xq8I6EBeEZJrIOy3nlwnF6weOW6zwHeOpg6R+a+Dg=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnGE3AZgPoVzGUtERg5LFCbq+DcCX9yF59ua8N n7+cbDKJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RylLaD/ 0VpOorKWab0lqsTn/JbqVIfhX2mJDl4AszKROjW7ZbWS/ibRaNsagLSEBcUQD0xf70owiN9Yu/4znW OJaUpWXS7tNbAxPV2AbNa13K/5M2I9XFuZ6Ma44gw7XUBL2+eLtpDnsloEntdH23CIdCBlFgVoMhZi /9B64BlcRvucuNxfRyundNxbRTbW+WL+gdtObdVpvurkEZPU7XSKLpbhrzZvQ4wxTGIf/25YvUVXE4 S4KSqSp0B49pkPN4G9xW/jIyjgX3WLAwBlhBHZ6f15+/NQ+pg/hwN9hKNNehLoCE2vircPueBHstEE KegTCjgg9BggHBOklhXRKvmGFOY2CVPkx96cbQhQZG615Mp2ODEKab08GpR6au3L0Lg1QT3JQbi+tU DKBdLQMl1MjDsazZBZ1VwoZL4CT5etgBB13PQWNVWlid9dl1osLUMQE9UJ/QAqUtZVfz88GpkskFmi ol2YvLHaQVsp/3n/N56wQok42wK/y+P/xvxYD1rz0ExOWlNuLbEBjbjgFZsEQ1KAlLg2XUnKJ4yMrB aBVOWnLHU2q3mFDWBNz/iUcF0U3KO0efHZsiu7NYItC7HDIgvosgX8QFMNIjDG1EfiEFvUcisgaxYH JcTYydI991Jqy6AeM9QpCB99Wg4k4tVqvmj18ICTymjWeNb7929I4r8679rA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock, and their irqsave/irqrestore variants, which wrap the rqspinlock APIs. bpf_res_spin_lock returns a conditional result, depending on whether the lock was acquired (NULL is returned when lock acquisition succeeds, non-NULL upon failure). The memory pointed to by the returned pointer upon failure can be dereferenced after the NULL check to obtain the error code. Instead of using the old bpf_spin_lock type, introduce a new type with the same layout, and the same alignment, but a different name to avoid type confusion. Preemption is disabled upon successful lock acquisition, however IRQs are not. Special kfuncs can be introduced later to allow disabling IRQs when taking a spin lock. Resilient locks are safe against AA deadlocks, hence not disabling IRQs currently does not allow violation of kernel safety. __irq_flag annotation is used to accept IRQ flags for the IRQ-variants, with the same semantics as existing bpf_local_irq_{save, restore}. These kfuncs will require additional verifier-side support in subsequent commits, to allow programs to hold multiple locks at the same time. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 7 +++ include/linux/bpf.h | 1 + kernel/locking/rqspinlock.c | 78 ++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 46119fc768b8..8249c2da09ad 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -23,6 +23,13 @@ struct rqspinlock { }; }; +/* Even though this is same as struct rqspinlock, we need to emit a distinct + * type in BTF for BPF programs. + */ +struct bpf_res_spin_lock { + u32 val; +}; + struct qspinlock; #ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3f50e29d639..35af09ee6a2c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index b4cceeecf29c..d05333203671 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -15,6 +15,8 @@ #include #include +#include +#include #include #include #include @@ -686,3 +688,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val, EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); #endif /* CONFIG_QUEUED_SPINLOCKS */ + +__bpf_kfunc_start_defs(); + +#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; }) + +__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock) +{ + int ret; + + BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock)); + BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock)); + + preempt_disable(); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock) +{ + res_spin_unlock((rqspinlock_t *)lock); + preempt_enable(); +} + +__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags; + int ret; + + preempt_disable(); + local_irq_save(flags); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + local_irq_restore(flags); + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + *ptr = flags; + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags = *ptr; + + res_spin_unlock((rqspinlock_t *)lock); + local_irq_restore(flags); + preempt_enable(); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(rqspinlock_kfunc_ids) +BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock) +BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore) +BTF_KFUNCS_END(rqspinlock_kfunc_ids) + +static const struct btf_kfunc_id_set rqspinlock_kfunc_set = { + .owner = THIS_MODULE, + .set = &rqspinlock_kfunc_ids, +}; + +static __init int rqspinlock_register_kfuncs(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set); +} +late_initcall(rqspinlock_register_kfuncs); From patchwork Thu Feb 6 10:54:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962829 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A58923237C; Thu, 6 Feb 2025 10:55:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839312; cv=none; b=UfAck2Yp2DlIoCfjVFAkWzhLDGgtFBiH7f7pzdURZUgUesdAwTGu6c4EmbOqDTbzj4I2T5xkAQvQpAHMhxsLN5OITdK5v3vnB7W3CAM1suTpbin3pAeZO7hfDONxluuhja4famDVtG0+C4txc1UpdXDwIKfu9UkA8Vz4hXqHmvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839312; c=relaxed/simple; bh=a6nXDtyfaaaN5/sxSs0OZmYxp3ViUzHxtEF2nqEtxP4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sC0VQz+fcFDGueEFZ26fZweKhP9GIlJA5Bu0PguduQb/aV+9qIcFJ5QTuoSp/6KM/P/kHzCWMoB/5SeEgv5YLONNYJ7bH2aJ2zb5CcK9TAw/oFA1h80ehOAd23E11siFZQpTEzDIGB/UaiaMheNve/msaf3z+svlovXJcJ8We4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QCRLAbMD; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QCRLAbMD" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-436281c8a38so4827195e9.3; Thu, 06 Feb 2025 02:55:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839308; x=1739444108; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MMJzjmm9Ssn/2tshx5tAZDCI9Z1engNm3Cw+RIUnYAs=; b=QCRLAbMDF0WIGBXbFvFpMtx20YKXbAV4Ipo4ZuE3A+nIQJ5nIp9m9KBiBqbXLvD2iP NePB9IjcKfu1x0lZnlc3B1QXxd77wAhYKTgSXLcQC2Hl0I8IxYIYtRhXPteKT9SZnjyK XcYd9tAe6IFOn94C2H3Sd88CrVTV6bn4nTmw2aaUCY3bvRiSYQgerNze1VvxX0uoqCDE grcPiILwKbI3Zfsu3AOubE3bOfsElBmLRdQ79Ww2bxe6XtHMWOhzlSgWB0ZtoylAbsRh 5plogVqwpbX+MbhF1JAYQehGbKdfTamjGmB7gUlVMxw25T0w4qN/N7z7PiReRa7spGSq XxRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839308; x=1739444108; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MMJzjmm9Ssn/2tshx5tAZDCI9Z1engNm3Cw+RIUnYAs=; b=RAsh573amU4BOvZrP+1onlkCN0Pk/9ktrH3gjM5NZLYmPA8oD/vAxaAmmxfMygBL+A Rmi4VP1lcv48/kDRdpgNWjIQ0cvMLcxbqgSVpRPxwppTATvY7BL81WVrlP3uh+iUrZk1 OIgHeTpibqii5mJFh3NU3K4H1/f9Ardutgym77tg/FfFmgBjWfUGeyt3A+zEcFVTkQsT dj++x4pCB6/xnOgAP6piBF0ydxf4odx+E6jL72AtPd5tf81l7OQifeKvvM7Oiv8WZE7q 8IzSgO83VYU1j8xwIGkE3joMc7nQ+vKlg0O0WORPxEJJSaiM/+slHs70KUqAojnxWHK7 TJOA== X-Forwarded-Encrypted: i=1; AJvYcCVkt6dawl4p+TGUfEbwDcB25aRaU9nS+lvJzxydGZ/GH6Ks5Tu3LM77H2mAoGJr+DJETXaye5UPLKl9x/E=@vger.kernel.org X-Gm-Message-State: AOJu0Yyu3tFtyq8OtPagdblmTwjqvGgJXvLxuNCZhqu7Fe+qDZi8Cd6W bYsFvA2HHPZmTP4uFFCnRx7STphZCjDssAI6JYFswX5Nd5SZ0d23JVVhZOCWFrg= X-Gm-Gg: ASbGncufgv/2IDqYhHLoHK0DdVK1yS0yA6YpQwjF7NqWFrsm7HJA40cSUnCuks38brH Uzs4CHtGsHWNXiBelsD7yoYiTtlWn/lDHQIFOlFo1g4wix1uAq82bFtuPUB1GmHYqYPHWlZHcQB yUUC7cwCEBiPgkVG49XxkDxOHJz60g3Yya743ldlKw0U7hVUzcyMYcOPl+1ESYMFojJC0EpV9L5 9vdCtRukQj9XeAKA56+g5WAGclYeAvs/kq8oemADTfKy44ULCQxGuucm5QLqFShYNeriDhPE8pU Pg== X-Google-Smtp-Source: AGHT+IEmR2/TmkoFR9idBNEPfQquu8zyJPeOqwMAgJpW3/scT3Z5dWwlgX9AEyJpUkJHqzG2L9+0eQ== X-Received: by 2002:a05:600c:1e1c:b0:434:effb:9f8a with SMTP id 5b1f17b1804b1-43912e54246mr23938815e9.15.1738839308376; Thu, 06 Feb 2025 02:55:08 -0800 (PST) Received: from localhost ([2a03:2880:31ff::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dbde1dc1dsm1375270f8f.87.2025.02.06.02.55.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:07 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 23/26] bpf: Handle allocation failure in acquire_lock_state Date: Thu, 6 Feb 2025 02:54:31 -0800 Message-ID: <20250206105435.2159977-24-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=776; h=from:subject; bh=a6nXDtyfaaaN5/sxSs0OZmYxp3ViUzHxtEF2nqEtxP4=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRndCMDTSB5KQHWglo8M0qqlVd8f6toksUk+2oC Ok/Xw+iJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RypokD/ 9GP0/dyt61QvEQ+p/5NGWAGtLJXjdj00R6BxemQXfhh7MvCRx41BKujZPuPCBt02Xog2UeoMlXt04D RrFCxgOD/1K7vHK54wFCXnJU1den0sUUyDwfsc7Gj/Z0ifdqhrqr90k3WL27q6L5Ur7ohUWpuvAnjx gB0+Y/CJisy/oVjoFzaGqCN1CzvRRmkSSNLrV8Uz5Tf6Op2AvJ0nuER2FdlG/+j0t9YBcFkR3pothS zfXRfrBvws9/BRL/R8Z15OeRzU1lgVUPjSL6C9RU+hwgIhYDCQ00BoTuMlZh511ojo4sTo3/Zvtf16 9+lkVBS8sPO5bap8JYLBHDl0nLnFTfxJMfDyUryqKVYV3GSM7Y86ST04cPqZ8EhWprpNw+iKmOmCKd gjHx0Q4VcPzA5uy+jmFPpNNMXyT9dmIjrev7NGi5DbvK/DNdjUYWQxJjzN02iC7/EvRTjngcxoWyPe rUBoOvzWn7iv0hRVbYWfJBakloSkfB7eIVP2e7rSgqjtytPi/PjYxdl90c9TqChCBlyDqVB8XPQ1gq y0eDqUv8LC+trmtM6Wr6odaXyc9aKb9guEkmSooa05h3a0/ZFg3aOhTfFbDmyB8nOJC/rMKNjw4sQ7 8dmYH22uTb9/O8oVVtMMomG3b8H8m0yE9UTVXLKxcqjDa8wUn5Y2vIIHFqcQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net The acquire_lock_state function needs to handle possible NULL values returned by acquire_reference_state, and return -ENOMEM. Fixes: 769b0f1c8214 ("bpf: Refactor {acquire,release}_reference_state") Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/verifier.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9971c03adfd5..d6999d085c7d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1501,6 +1501,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r struct bpf_reference_state *s; s = acquire_reference_state(env, insn_idx); + if (!s) + return -ENOMEM; s->type = type; s->id = id; s->ptr = ptr; From patchwork Thu Feb 6 10:54:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962831 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92A6B233135; Thu, 6 Feb 2025 10:55:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839315; cv=none; b=MVRUEfKzHRTz3/GT4c/AXOrCd0F8MRbXH5zL2dpX1TWbnlipr6AMmwpWPKJEZZEpNsOnOgpIYoRnavllSIkZC4XHLkm9VxjH1ma/fnUdd0tTW1fHrptBim/9utuBu+gXrmz2o02K5bm7vqmzoKbM5baJmHt7rss6tRJFyaLSsdc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839315; c=relaxed/simple; bh=5HGNbFTsma4FLf8lx8r/wItLdfJjrA3dk1lFblbB38k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lBtZu72inXXla+2hc+U8jPvSmeYEE5N5S4Gv8096mvspx7IjE2ylpdwtYhxZGBGVuNtqM8O460UqXQOVwCn+4OcVnEral7tBaBCmyhUNASb2PyKJRex14zfsUxp77YjLNXKySxNHzW19CAs7qCBHViS2GsNJBrx/+g9nMYa+ARQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hJz2sWzl; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hJz2sWzl" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-43622267b2eso7537335e9.0; Thu, 06 Feb 2025 02:55:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839310; x=1739444110; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RabZcLwVRQNGP0BmugyphEg76jVqJJBXW2DB0rYNNAY=; b=hJz2sWzlwcy+vIcpiBmTS/ZV8l89f09JmaOMfT0qbjbEMtNnIGkwCtSHX0n7kLCObH IwOuyXYb11kS4ZeXSqle7AcUGtogUC6/mZcERdFqGg4GhyzJcnyLFEkjJfdLWZdLZv6E SR8Bgfk95X3P0TQQ0LqwC5uSoExY29cCT4mW+qV6SSXwtzh0bW/+QFzfFGRsSrsk+IEy RcQLA8uczp++aQFgtPKBYRA5lwOsxTJOiappp1HPIxaOkz/Xs0rb9TTscqvnundfm+9r +L3T5+t+kXV7UbwGnmGFx53CY/nxyFbp1voCB+rpVEX/gDS7QL02o5c8W1wxtk5OKYcA Vzcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839310; x=1739444110; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RabZcLwVRQNGP0BmugyphEg76jVqJJBXW2DB0rYNNAY=; b=qk8o+sT7MknuxIEXq8/A82cbEfw5feBkaGi3jNOZzcSTszj6yDxiN/dV5WwfXBcWgN SQbHO5I6RqkEPqkzBkkLqYzoJOtcPuTaNODsQepY7BswKwaG/AasRn/vVOAfayQusptM 9n9BrMEuLn1Y6Zg4RcQDxFz3FOoS7ddM8tMuvGsS8jcLOBodwsfSebszm7+HhrpJXVEi V2nOXz2vWr0zMVQPxw24xxVqseRaaUx8MeV9qOmMh33oTjT5RzeGkp/M2PseGOlvqCWp gYRlXxlrTQxekPoZZDarBPWuaCNfb54ZfzfFUPIv6gKhnKTuPPAB5u2ZHRlPiPGiBlHI Go0Q== X-Forwarded-Encrypted: i=1; AJvYcCWK0j7DNuDM8RzE7MjAaMWAQIBjDbiUAS8UTGxgeMCCSi9FPdJ7p15IuF4iklI0GXtVz25KMncRMepSEJo=@vger.kernel.org X-Gm-Message-State: AOJu0YzaAYaBOkNgozaEwwppxNu+kJZ2+gb74mfqwvZ8PFC2yQcAlLFB KOl1EatW28z6lwPFi/8DogGCcKNmC2s1vBJqmfwmaq+9RyhkNmX7TzzOOLoewpY= X-Gm-Gg: ASbGncvfFzSrsUiR32ivtreZzYyIWl2nsKK5rByttZ2kRMJkwVMzVk+mnhlKur1AyTt Ct5xY2XfqhVZVUNf+bpFZNfvB88UqWaCKN7h8BWnvZWzdT2w+cHhoWMpVq7UM1J1vi4oUyqZYrj Vvwez05LvhltOvdetianqrKn21V4Dkm5+HukMPpHVaQm6WPlHj/KW4ZGG387MSgCb2wWfYU6S4+ kJ62BoDaMxThUUPpwmDRdJPSR5MIMm7cMDntIli7cRRouc9LFtoF6zvnCf1a021qVFY2uRq8a3A hqs3 X-Google-Smtp-Source: AGHT+IFIYCB/oajw2Qvxf5XpXM6y+pT2gnKN3xko42QpRO/uNhJ36s5dySnRLck/L268tkP6cbHLvg== X-Received: by 2002:a05:600c:1f8f:b0:434:ff30:a159 with SMTP id 5b1f17b1804b1-4390d34b326mr53742055e9.0.1738839310094; Thu, 06 Feb 2025 02:55:10 -0800 (PST) Received: from localhost ([2a03:2880:31ff:7::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d94d7d4sm52313755e9.10.2025.02.06.02.55.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:09 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock Date: Thu, 6 Feb 2025 02:54:32 -0800 Message-ID: <20250206105435.2159977-25-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=28449; h=from:subject; bh=5HGNbFTsma4FLf8lx8r/wItLdfJjrA3dk1lFblbB38k=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnui/XZ+ItES1wv9fM/NvhMZxRq5RhAoodvGS2 ZRuffZCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RysFCD/ 4gVuhWUETLd0qMcRO+R4F5+AN7eVpScQItfzUD/5wd0Q+cj3KteCL9DvZqOyGrY3QAIR9RZYqSaNDL O/mIcF2Pvh6Cd4XW0HW9DX1kZTMtaNAX5y3Gf4MN5ec+3TRpze/Akrti+WPYsH6c2uLrYBQWRxTn1n 2nmXOnx8l0z9/hL0IiTo9+NU8hXMfYKyowF3WlbLb+qXzVAGbBgY8ujOAxHz7ChloIpJJycv6bt2px X+0gww/ScR2tiGeGccRHwjsv9A5aZwmf1rEgR+JxNWUDVmw5Q1bxLfcENHtdkmaY2LUSJRhbC2azIk Mhp1Qxtm4cKs6uDuXxuD6BC31t9Wh1sjAXDUHkkVzvmozKjq+iJaJfmq9lBIKIxm+F4VQ0IJeIu+ht ITxR8QbwqKVAaSvF4dtUZVj/rt0YUiXr3qhkIJkeJhVFHEIGyPLT8vAn1UxElhe76/uoOf0cEi/Fap Mu9etskGVBz22A+IZ3TPuAW/fi1WDtm/ORrB0Zppv7MWJt7QtJCxeRUdjMZUW19IozwqeH8oSDnccR Z6lXr96lJ79xOgOwGOPUFv9vqUOPLF2PHh7pe2TML+07+Q6rcQrxCQ51c2qz/+mhe7r7mGe3DzoB24 Y2voqsR/rq+xaXg/ZTLcx0T4rvjTs+U7b+M/9ldEAImGeezuk/XkRjwJxhSA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate. Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list. The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction. When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases. We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save. With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel. Signed-off-by: Kumar Kartikeya Dwivedi Acked-by: Eduard Zingerman --- include/linux/bpf.h | 9 ++ include/linux/bpf_verifier.h | 17 ++- kernel/bpf/btf.c | 26 ++++- kernel/bpf/syscall.c | 6 +- kernel/bpf/verifier.c | 219 ++++++++++++++++++++++++++++------- 5 files changed, 232 insertions(+), 45 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 35af09ee6a2c..91dddf7396f9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -205,6 +205,7 @@ enum btf_field_type { BPF_REFCOUNT = (1 << 9), BPF_WORKQUEUE = (1 << 10), BPF_UPTR = (1 << 11), + BPF_RES_SPIN_LOCK = (1 << 12), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -240,6 +241,7 @@ struct btf_record { u32 cnt; u32 field_mask; int spin_lock_off; + int res_spin_lock_off; int timer_off; int wq_off; int refcount_off; @@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return "bpf_spin_lock"; + case BPF_RES_SPIN_LOCK: + return "bpf_res_spin_lock"; case BPF_TIMER: return "bpf_timer"; case BPF_WORKQUEUE: @@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return sizeof(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return sizeof(struct bpf_res_spin_lock); case BPF_TIMER: return sizeof(struct bpf_timer); case BPF_WORKQUEUE: @@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return __alignof__(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return __alignof__(struct bpf_res_spin_lock); case BPF_TIMER: return __alignof__(struct bpf_timer); case BPF_WORKQUEUE: @@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_RB_ROOT: /* RB_ROOT_CACHED 0-inits, no need to do anything after memset */ case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_KPTR_UNREF: diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 32c23f2a3086..ed444e44f524 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -115,6 +115,15 @@ struct bpf_reg_state { int depth:30; } iter; + /* For irq stack slots */ + struct { + enum { + IRQ_KFUNC_IGNORE, + IRQ_NATIVE_KFUNC, + IRQ_LOCK_KFUNC, + } kfunc_class; + } irq; + /* Max size from any of the above. */ struct { unsigned long raw1; @@ -255,9 +264,11 @@ struct bpf_reference_state { * default to pointer reference on zero initialization of a state. */ enum ref_state_type { - REF_TYPE_PTR = 1, - REF_TYPE_IRQ = 2, - REF_TYPE_LOCK = 3, + REF_TYPE_PTR = (1 << 1), + REF_TYPE_IRQ = (1 << 2), + REF_TYPE_LOCK = (1 << 3), + REF_TYPE_RES_LOCK = (1 << 4), + REF_TYPE_RES_LOCK_IRQ = (1 << 5), } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 9433b6467bbe..aba6183253ea 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3480,6 +3480,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_ goto end; } } + if (field_mask & BPF_RES_SPIN_LOCK) { + if (!strcmp(name, "bpf_res_spin_lock")) { + if (*seen_mask & BPF_RES_SPIN_LOCK) + return -E2BIG; + *seen_mask |= BPF_RES_SPIN_LOCK; + type = BPF_RES_SPIN_LOCK; + goto end; + } + } if (field_mask & BPF_TIMER) { if (!strcmp(name, "bpf_timer")) { if (*seen_mask & BPF_TIMER) @@ -3658,6 +3667,7 @@ static int btf_find_field_one(const struct btf *btf, switch (field_type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_LIST_NODE: @@ -3951,6 +3961,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(-ENOMEM); rec->spin_lock_off = -EINVAL; + rec->res_spin_lock_off = -EINVAL; rec->timer_off = -EINVAL; rec->wq_off = -EINVAL; rec->refcount_off = -EINVAL; @@ -3978,6 +3989,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type /* Cache offset for faster lookup at runtime */ rec->spin_lock_off = rec->fields[i].offset; break; + case BPF_RES_SPIN_LOCK: + WARN_ON_ONCE(rec->spin_lock_off >= 0); + /* Cache offset for faster lookup at runtime */ + rec->res_spin_lock_off = rec->fields[i].offset; + break; case BPF_TIMER: WARN_ON_ONCE(rec->timer_off >= 0); /* Cache offset for faster lookup at runtime */ @@ -4021,9 +4037,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type rec->cnt++; } + if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) { + ret = -EINVAL; + goto end; + } + /* bpf_{list_head, rb_node} require bpf_spin_lock */ if ((btf_record_has_field(rec, BPF_LIST_HEAD) || - btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) { + btf_record_has_field(rec, BPF_RB_ROOT)) && + (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) { ret = -EINVAL; goto end; } @@ -5636,7 +5658,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) type = &tab->types[tab->cnt]; type->btf_id = i; - record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | + record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | BPF_KPTR, t->size); /* The record cannot be unset, treat it as an error if so */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c420edbfb7c8..054707215d28 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) switch (fields[i].type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: break; case BPF_TIMER: bpf_timer_cancel_and_free(field_ptr); @@ -1203,7 +1206,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, return -EINVAL; map->record = btf_parse_fields(btf, value_type, - BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | + BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { @@ -1222,6 +1225,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, case 0: continue; case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d6999d085c7d..294761dd0072 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog) static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { - return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); + return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK); } static bool type_is_rdonly_mem(u32 type) @@ -1148,7 +1148,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id); static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, - struct bpf_reg_state *reg, int insn_idx) + struct bpf_reg_state *reg, int insn_idx, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1170,6 +1171,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = id; + st->irq.kfunc_class = kfunc_class; for (i = 0; i < BPF_REG_SIZE; i++) slot->slot_type[i] = STACK_IRQ_FLAG; @@ -1178,7 +1180,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, return 0; } -static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg) +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1192,6 +1195,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r slot = &state->stack[spi]; st = &slot->spilled_ptr; + if (kfunc_class != IRQ_KFUNC_IGNORE && st->irq.kfunc_class != kfunc_class) { + const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + + verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n", + flag_kfunc, used_kfunc); + return -EINVAL; + } + err = release_irq_state(env->cur_state, st->ref_obj_id); WARN_ON_ONCE(err && err != -EACCES); if (err) { @@ -1591,7 +1603,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st for (i = 0; i < state->acquired_refs; i++) { struct bpf_reference_state *s = &state->refs[i]; - if (s->type != type) + if (!(s->type & type)) continue; if (s->id == id && s->ptr == ptr) @@ -7985,6 +7997,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg return err; } +enum { + PROCESS_SPIN_LOCK = (1 << 0), + PROCESS_RES_LOCK = (1 << 1), + PROCESS_LOCK_IRQ = (1 << 2), +}; + /* Implementation details: * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL. * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL. @@ -8007,30 +8025,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg * env->cur_state->active_locks remembers which map value element or allocated * object got locked and clears it after bpf_spin_unlock. */ -static int process_spin_lock(struct bpf_verifier_env *env, int regno, - bool is_lock) +static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) { + bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK; + const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin"; struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; struct bpf_verifier_state *cur = env->cur_state; bool is_const = tnum_is_const(reg->var_off); + bool is_irq = flags & PROCESS_LOCK_IRQ; u64 val = reg->var_off.value; struct bpf_map *map = NULL; struct btf *btf = NULL; struct btf_record *rec; + u32 spin_lock_off; int err; if (!is_const) { verbose(env, - "R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n", - regno); + "R%d doesn't have constant offset. %s_lock has to be at the constant offset\n", + regno, lock_str); return -EINVAL; } if (reg->type == PTR_TO_MAP_VALUE) { map = reg->map_ptr; if (!map->btf) { verbose(env, - "map '%s' has to have BTF in order to use bpf_spin_lock\n", - map->name); + "map '%s' has to have BTF in order to use %s_lock\n", + map->name, lock_str); return -EINVAL; } } else { @@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, } rec = reg_btf_record(reg); - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", - map ? map->name : "kptr"); + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) { + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local", + map ? map->name : "kptr", lock_str); return -EINVAL; } - if (rec->spin_lock_off != val + reg->off) { - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", - val + reg->off, rec->spin_lock_off); + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off; + if (spin_lock_off != val + reg->off) { + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n", + val + reg->off, lock_str, spin_lock_off); return -EINVAL; } if (is_lock) { void *ptr; + int type; if (map) ptr = map; else ptr = btf; - if (cur->active_locks) { - verbose(env, - "Locking two bpf_spin_locks are not allowed\n"); - return -EINVAL; + if (!is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) { + verbose(env, + "Locking two bpf_spin_locks are not allowed\n"); + return -EINVAL; + } + } else if (is_res_lock) { + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) { + verbose(env, "Acquiring the same lock again, AA deadlock detected\n"); + return -EINVAL; + } } - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr); + + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr); if (err < 0) { verbose(env, "Failed to acquire lock state\n"); return err; } } else { void *ptr; + int type; if (map) ptr = map; @@ -8075,12 +8113,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, ptr = btf; if (!cur->active_locks) { - verbose(env, "bpf_spin_unlock without taking a lock\n"); + verbose(env, "%s_unlock without taking a lock\n", lock_str); return -EINVAL; } - if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) { - verbose(env, "bpf_spin_unlock of different lock\n"); + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + if (release_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; } @@ -9391,11 +9435,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, return -EACCES; } if (meta->func_id == BPF_FUNC_spin_lock) { - err = process_spin_lock(env, regno, true); + err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK); if (err) return err; } else if (meta->func_id == BPF_FUNC_spin_unlock) { - err = process_spin_lock(env, regno, false); + err = process_spin_lock(env, regno, 0); if (err) return err; } else { @@ -11274,7 +11318,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; if (!type_may_be_null(ret_flag) && - btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { + btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } break; @@ -11446,10 +11490,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn /* mark_btf_func_reg_size() is used when the reg size is determined by * the BTF func_proto's return value size and argument. */ -static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, - size_t reg_size) +static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs, + u32 regno, size_t reg_size) { - struct bpf_reg_state *reg = &cur_regs(env)[regno]; + struct bpf_reg_state *reg = ®s[regno]; if (regno == BPF_REG_0) { /* Function return value */ @@ -11467,6 +11511,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, } } +static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, + size_t reg_size) +{ + return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size); +} + static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) { return meta->kfunc_flags & KF_ACQUIRE; @@ -11604,6 +11654,7 @@ enum { KF_ARG_RB_ROOT_ID, KF_ARG_RB_NODE_ID, KF_ARG_WORKQUEUE_ID, + KF_ARG_RES_SPIN_LOCK_ID, }; BTF_ID_LIST(kf_arg_btf_ids) @@ -11613,6 +11664,7 @@ BTF_ID(struct, bpf_list_node) BTF_ID(struct, bpf_rb_root) BTF_ID(struct, bpf_rb_node) BTF_ID(struct, bpf_wq) +BTF_ID(struct, bpf_res_spin_lock) static bool __is_kfunc_ptr_arg_type(const struct btf *btf, const struct btf_param *arg, int type) @@ -11661,6 +11713,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg) return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID); } +static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg) +{ + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID); +} + static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, const struct btf_param *arg) { @@ -11732,6 +11789,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_MAP, KF_ARG_PTR_TO_WORKQUEUE, KF_ARG_PTR_TO_IRQ_FLAG, + KF_ARG_PTR_TO_RES_SPIN_LOCK, }; enum special_kfunc_type { @@ -11768,6 +11826,10 @@ enum special_kfunc_type { KF_bpf_iter_num_new, KF_bpf_iter_num_next, KF_bpf_iter_num_destroy, + KF_bpf_res_spin_lock, + KF_bpf_res_spin_unlock, + KF_bpf_res_spin_lock_irqsave, + KF_bpf_res_spin_unlock_irqrestore, }; BTF_SET_START(special_kfunc_set) @@ -11846,6 +11908,10 @@ BTF_ID(func, bpf_local_irq_restore) BTF_ID(func, bpf_iter_num_new) BTF_ID(func, bpf_iter_num_next) BTF_ID(func, bpf_iter_num_destroy) +BTF_ID(func, bpf_res_spin_lock) +BTF_ID(func, bpf_res_spin_unlock) +BTF_ID(func, bpf_res_spin_lock_irqsave) +BTF_ID(func, bpf_res_spin_unlock_irqrestore) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -11939,6 +12005,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_irq_flag(meta->btf, &args[argno])) return KF_ARG_PTR_TO_IRQ_FLAG; + if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno])) + return KF_ARG_PTR_TO_RES_SPIN_LOCK; + if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { if (!btf_type_is_struct(ref_t)) { verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", @@ -12046,13 +12115,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, struct bpf_kfunc_call_arg_meta *meta) { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; + int err, kfunc_class = IRQ_NATIVE_KFUNC; bool irq_save; - int err; - if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) { + if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) { irq_save = true; - } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) { + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + kfunc_class = IRQ_LOCK_KFUNC; + } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) { irq_save = false; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + kfunc_class = IRQ_LOCK_KFUNC; } else { verbose(env, "verifier internal error: unknown irq flags kfunc\n"); return -EFAULT; @@ -12068,7 +12143,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx); + err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class); if (err) return err; } else { @@ -12082,7 +12157,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = unmark_stack_slot_irq_flag(env, reg); + err = unmark_stack_slot_irq_flag(env, reg, kfunc_class); if (err) return err; } @@ -12209,7 +12284,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, + id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -12245,9 +12321,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id) btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]; } +static bool is_bpf_res_spin_lock_kfunc(u32 btf_id) +{ + return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]; +} + static bool kfunc_spin_allowed(u32 btf_id) { - return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id); + return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) || + is_bpf_res_spin_lock_kfunc(btf_id); } static bool is_sync_callback_calling_kfunc(u32 btf_id) @@ -12679,6 +12764,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_CONST_STR: case KF_ARG_PTR_TO_WORKQUEUE: case KF_ARG_PTR_TO_IRQ_FLAG: + case KF_ARG_PTR_TO_RES_SPIN_LOCK: break; default: WARN_ON_ONCE(1); @@ -12977,6 +13063,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ if (ret < 0) return ret; break; + case KF_ARG_PTR_TO_RES_SPIN_LOCK: + { + int flags = PROCESS_RES_LOCK; + + if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + verbose(env, "arg#%d doesn't point to map value or allocated object\n", i); + return -EINVAL; + } + + if (!is_bpf_res_spin_lock_kfunc(meta->func_id)) + return -EFAULT; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + flags |= PROCESS_SPIN_LOCK; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + flags |= PROCESS_LOCK_IRQ; + ret = process_spin_lock(env, regno, flags); + if (ret < 0) + return ret; + break; + } } } @@ -13062,6 +13170,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_aux->is_iter_next = is_iter_next_kfunc(&meta); + if (!insn->off && + (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] || + insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) { + struct bpf_verifier_state *branch; + struct bpf_reg_state *regs; + + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); + if (!branch) { + verbose(env, "failed to push state for failed lock acquisition\n"); + return -ENOMEM; + } + + regs = branch->frame[branch->curframe]->regs; + + /* Clear r0-r5 registers in forked state */ + for (i = 0; i < CALLER_SAVED_REGS; i++) + mark_reg_not_init(env, regs, caller_saved[i]); + + mark_reg_unknown(env, regs, BPF_REG_0); + err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1); + if (err) { + verbose(env, "failed to mark s32 range for retval in forked state for lock\n"); + return err; + } + __mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32)); + } + if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n"); return -EACCES; @@ -13232,6 +13367,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (btf_type_is_scalar(t)) { mark_reg_unknown(env, regs, BPF_REG_0); + if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) + __mark_reg_const_zero(env, ®s[BPF_REG_0]); mark_btf_func_reg_size(env, BPF_REG_0, t->size); } else if (btf_type_is_ptr(t)) { ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); @@ -18114,7 +18252,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, case STACK_IRQ_FLAG: old_reg = &old->stack[spi].spilled_ptr; cur_reg = &cur->stack[spi].spilled_ptr; - if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap)) + if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) || + old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class) return false; break; case STACK_MISC: @@ -18158,6 +18297,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c case REF_TYPE_IRQ: break; case REF_TYPE_LOCK: + case REF_TYPE_RES_LOCK: + case REF_TYPE_RES_LOCK_IRQ: if (old->refs[i].ptr != cur->refs[i].ptr) return false; break; @@ -19491,7 +19632,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } - if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { + if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n"); return -EINVAL; From patchwork Thu Feb 6 10:54:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962830 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C86B227560; Thu, 6 Feb 2025 10:55:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839315; cv=none; b=hWK5G6+b1/X2Hs5nl+nYA3QwhNZP4UeBJJd+cFyxKgEWAi+Nj47uvD09bYk3BBbYnTy6GdaVXiUjrzuX5JjoYdIoFixi7T+CAHYI9x3TUoRBDWFP9r/2RA4CxqRjsMvgpNBRLDbJznLjgx/bwaYZIHgZCighZxRkmeil+zV0ipI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839315; c=relaxed/simple; bh=CpkXgirFn/osRo9yRzXQj0S5SI5kmNKbptLkawFmaSc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qLNkidODfjHHfa7906zVR1oJN59BEXKfG8wj3uDaWlpzSpnipSOWLeIKwPOi+noo1X+DlovsglCyhxwte2K8jsUxKf6zZ50bWL/hGkp1xwgfXs6O9/N0x1V1y7eI2bBdi+CoUZl64z3Q5oxaC/jWZhy3c7BlvHAMrYFI9ZMR5ZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aqvz62n+; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aqvz62n+" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-4361815b96cso4771885e9.1; Thu, 06 Feb 2025 02:55:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839311; x=1739444111; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bEAtfk9sZ7vuzt+HmDZCm4BnfnO/7nvi4Al6O9lbnV0=; b=aqvz62n+cISyS51gPBqfK3FuRwPLR+5fpzyvuBIGJNSHPJ1wUpRrJflBRhPjONNwko 2lEFyb7H01XM2fr70NclNQzAmKNomcERAMNy9hOP+dvanKP1SqIZcXuJNBe6waU7MLOv PWF3EtlGe1tz6PTs8wuI6JdWzZq09yBZ7s5Y1XsBxfPEFvpUFny5LAZM/nUA1kUgs0QS L+Ln/4/brMtYN9Siq3WJAsJWSOOoO3RUXLYYMJV+XBLSxg3Jf56bsXcQqc5B74mAGXgS jS+eBjCzDDoU+aVqLGWIw1NWp01iRoAUjuPTpfz+h5BpBxjhZLKK2XjGxx5YMh4O03/Q C/cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839311; x=1739444111; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bEAtfk9sZ7vuzt+HmDZCm4BnfnO/7nvi4Al6O9lbnV0=; b=q7d8nyO2WDpdkI/rZSWOTCe5KXU+fWc0+BAjnTJkYN0OODQvWIoEBzAtTgCIgiOjCr LF1Ok/8TqcVSmAfNjhaypjxfjElkDdfAlwAqd5kH+fBk3GBWPvyLQbwQJq5HpJr5Agr0 yrwEeG2XHrI4x8lpoycVsWWp4Xe+azEVdXUzkHIIkDQXpakqhYlt0Hw8MYVq2yNp21MW QIC2Mbm/GXzr094A/N4j993Jtifc2dtuCKHQnTdjyGXi5JpMS3SBOb4qBugL2fJBp58U Oiu30koA57DTZrzd3AqHGHCC5clbZKT/W+h2Tpq8qaHFj+qpTQGkkx8Xr2AH/agmzeD2 qEnw== X-Forwarded-Encrypted: i=1; AJvYcCUmVEeP9Qr0hl2pSpmhnu4QHnMGeNavjwWCbjcKaQmamuZxhEo0vhCAx9CqC0lPjBG6TJFQBC9yBLT0Ai0=@vger.kernel.org X-Gm-Message-State: AOJu0Yx1NEaUZsm+bHVnkjsoH+F0pxXT3T1egO/PH3Ko2wuGeZKvq8+S zuUyWK5t7Jii/LG0Fp8rxv0cgqmgr+0SdDDH85LfOEMVuxHDU68PI9ZwfOic7wg= X-Gm-Gg: ASbGncve1v+FZwtYCMRgdJB9DOet3Lxb8YF0ZFSFQFAlOnvz2bezinAJVXnOszWdaDP g8QbL5A6XIHVsWZCu4oIGd8qgcDixm04HZIm8FDSrBPGIx4p8jSp9ZxHQHYGqg/GNl7aXzbkdHw YRpXa/Honwml9KtL1JMtERB1dednKGZ32zBeN13A2ih33Cw6tKfndo8QKwG+AnE38xzSoxiLuuR wumEFgoJtWmuIab08kOP17J7yUNRtzv4OKFfbfv0w+u+DJHlfAUwh2CAV613rGxuTtCsOpiMY+x JS0RSA== X-Google-Smtp-Source: AGHT+IFqbAZGecNZkCkV4EyCt9uLl+Me6VZBFwE4Ef4f+7NoRNBRql5C5/fdSnEXvukiZ8DRRQsg4A== X-Received: by 2002:a05:600c:3502:b0:434:a468:4a57 with SMTP id 5b1f17b1804b1-4390d56d740mr41168565e9.26.1738839311535; Thu, 06 Feb 2025 02:55:11 -0800 (PST) Received: from localhost ([2a03:2880:31ff:18::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d966faasm51736715e9.23.2025.02.06.02.55.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:10 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 25/26] bpf: Maintain FIFO property for rqspinlock unlock Date: Thu, 6 Feb 2025 02:54:33 -0800 Message-ID: <20250206105435.2159977-26-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject; bh=CpkXgirFn/osRo9yRzXQj0S5SI5kmNKbptLkawFmaSc=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRn9kSW/6/ffD2Y5sJ0f0LySFJbetcpzORHAy69 oJgq0WmJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RysTTEA Cs/dM0FPIuRegO0Ni1ae0PS45twSts6Al635nhtr0pJrXE19WwWgKiImtGQFXqq3EPB5/8SvfuaE8G J1P1qGxlxHodgTsNGv+svmEY0/JEu+q38yAWvB4eRw3JLHg9CbMW51ggxE1p3ZpyZYdbpTgg6oxbQE tBXlQcMrV6jKRRSW1tnaKkmrC9wWPdweMrStWBRYVKxQBsyiRvRP58LTSpAeAwLr8stkpLRqzpdlUM WixOEfnvLFoJZoPEOHf1EBfFZyIThmEomcF7O0RKWuNsk5bXmo/QkqIcglaJT064UNbt+Xcpn2Bdgv Q+c7mIOHcVf31dJK9WBMiOB2YeIOcQuybv5/L73vuSoYFc4o2Hj5Rl0C+sRvUP8mIlwf25+jxQvYBP 4tTP5udWN/wvGI3i3o1VSO5asN9D7U6bGMJ1ZAfO4SBI7Jy7BbBDsa143gR8A2A30Kw9d5HSmip4a0 DmlaU7qWHdV2XFecXaKQvl/1VrvRH9XWp3ofuOWKzZkjOYKHnOjpcqd9d0JVn9LNE+45UH8SAgmHyl VGJ9BQJpdJmgnhnBI4c3ltCVvXs7IkhGfKc0k5MJwxPnQTv+uV0xjj9cFG4MJ77JOXjgkV1tpM1Nka 70APhaWwVIcUoIfaNj8H3ninXADBCDCYIA8glTB00cVm16MtNyCn/PsFhxVQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced. Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock. Take special care to make these fields part of the state comparison in refsafe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf_verifier.h | 3 +++ kernel/bpf/verifier.c | 33 ++++++++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index ed444e44f524..92cd2289b743 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -269,6 +269,7 @@ struct bpf_reference_state { REF_TYPE_LOCK = (1 << 3), REF_TYPE_RES_LOCK = (1 << 4), REF_TYPE_RES_LOCK_IRQ = (1 << 5), + REF_TYPE_LOCK_MASK = REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). @@ -435,6 +436,8 @@ struct bpf_verifier_state { u32 active_locks; u32 active_preempt_locks; u32 active_irq_id; + u32 active_lock_id; + void *active_lock_ptr; bool active_rcu_lock; bool speculative; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 294761dd0072..9cac6ea4f844 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1421,6 +1421,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf dst->active_preempt_locks = src->active_preempt_locks; dst->active_rcu_lock = src->active_rcu_lock; dst->active_irq_id = src->active_irq_id; + dst->active_lock_id = src->active_lock_id; + dst->active_lock_ptr = src->active_lock_ptr; return 0; } @@ -1520,6 +1522,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r s->ptr = ptr; state->active_locks++; + state->active_lock_id = id; + state->active_lock_ptr = ptr; return 0; } @@ -1559,16 +1563,24 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx) static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr) { + void *prev_ptr = NULL; + u32 prev_id = 0; int i; for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].type != type) - continue; - if (state->refs[i].id == id && state->refs[i].ptr == ptr) { + if (state->refs[i].type == type && state->refs[i].id == id && + state->refs[i].ptr == ptr) { release_reference_state(state, i); state->active_locks--; + /* Reassign active lock (id, ptr). */ + state->active_lock_id = prev_id; + state->active_lock_ptr = prev_ptr; return 0; } + if (state->refs[i].type & REF_TYPE_LOCK_MASK) { + prev_id = state->refs[i].id; + prev_ptr = state->refs[i].ptr; + } } return -EINVAL; } @@ -8123,6 +8135,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) type = REF_TYPE_RES_LOCK; else type = REF_TYPE_LOCK; + if (!find_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); + return -EINVAL; + } + if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) { + verbose(env, "%s_unlock cannot be out of order\n", lock_str); + return -EINVAL; + } if (release_lock_state(cur, type, reg->id, ptr)) { verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; @@ -12284,8 +12304,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, - id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -18288,6 +18307,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap)) return false; + if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) || + old->active_lock_ptr != cur->active_lock_ptr) + return false; + for (i = 0; i < old->acquired_refs; i++) { if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) || old->refs[i].type != cur->refs[i].type) From patchwork Thu Feb 6 10:54:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13962832 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F6EA233D91; Thu, 6 Feb 2025 10:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839317; cv=none; b=ZF+s7ZlDbIfSMkTqh53IYowkl+QApPn3dMZZhWJOUthWmo/kWVC7GBeGRB7Pjbcp+pOxDSJLSw/i94rQ6w5tdWRIZNH89taRivvu4krLbMn2R6M8eA+wKea4V4DKeYBkv+UetSNabuIVXukRw8UMFz28UgH3MVRE1GJ1ddzrDVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738839317; c=relaxed/simple; bh=T/KW/xDA8UwHhiAJo3V+JzDoebizVQBXrYloT3/u9JY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ne1hKwAfwR0meYgz9z7D2Mmhw9SObYc4vuJayi7AJP0KPBACg8DrG3ZGxtteJZf0pTAWwxpuiyCfQqMs3Kzo4gvnSx/ASAuMNFWciHu/8iLw5+miVLfoQ6DEywVhiE/BzZvJ4IRpQ1k7zZtcmo6hhb4fznVAZAZ+DFxZtmdYXWA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Y2pLSIy/; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Y2pLSIy/" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-436249df846so4628725e9.3; Thu, 06 Feb 2025 02:55:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738839313; x=1739444113; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XI5mR9awotpFmMGmD6PyGcm5Yi9TOUmCSpGD+JUgaXI=; b=Y2pLSIy/yIvGmi+QmX5ue8cDuzNtvBDEH2Y0IMzeHWbP0UrM6NGYnFP5bDmuJIisRY pFZFmH1FTjZbHYP6c3B6BNWM7gjnYGiEqOsZtYuvWmC54vsmV/n6JB93QEVrOjm60eaT ktxyJhnfwXRYn53SPSIoGI71xUXAolzDsBRuO4dZyM8qN6QzXaleqa17xDFklRuI9g5B Yr7RfMJT7oDpm9oQPm8xY4CYzBXWIztpfLvrvw5d/Ef5pJc83OBggS9PhmP1Lw/qjOfj DiEM8r1Y13iGB4BP9RmvP/PR3XSYLb00bZeA2xsNhTpojinQmkrkckiq3XOa01RvGybZ 1YMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738839313; x=1739444113; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XI5mR9awotpFmMGmD6PyGcm5Yi9TOUmCSpGD+JUgaXI=; b=jxqky431f5srgqDsSty5NcoNYfu5VcHhsZL4qE1/T0TwzFjG2jgy6jOh6FygaQ+cBt OOzaJ2u1WDb4OhSYmtbNxIXmq3jsSR2lSXfy2T69+mgJkVfxiRnauL3RrbSveAkcdKvi q3TDCjg+LWf6XRBkYDJXUczlMAdDZbwtpQDSGqOufa6byqv0DdMl63jIdnAiRedywGSj 1bDSp8zjfaLkH7joIcQ6/bYbMUL34jEfeKVY5Vej65TMQTlIvw0nTOhRW+p5lYrBMnhY nh60BNYAlNNWcspKjf6klC13rUBTOVSJha3fe5LCxWxGfcUv70XtMr8zb9pHxQSmnOQD HAfw== X-Forwarded-Encrypted: i=1; AJvYcCWd7af2xupI1JVwz+zCysqwKfevAMKbzL3NnnbrdjYATGtoPa0d+0b2cbFkH2rGBdpxoELYBmJDm4FrZMQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxdcXuYO7K1VgdthTt6tQrwsqqetrM2JpMNWPtMsAWOmKhYwF4f tAdTpOvGsWh7Z+FINlpgo7ZLzs7oV8N3OcsZ8ZXeHixHLNfuIirIm7SXnQTs8Nw= X-Gm-Gg: ASbGncv7dWvuNROvIxwY0jVRSvyPhcxMATaXi6JfP/xXiah/IKpRGv+OHKq2LZyLT/K UIA1LJtYny7hg+JT8V+2bHpRZqb5ati4arUz4OrK/MFE+5/8R/aYD3QuAHBmy+EGhfrwGXOKbPX o7A8tvZKLd6raud3OY/ROBGc6wyRv5cp8g4gmm5WWncTOPIgtkt9pojJVfBlUtMlSyMw1RUQJQw zwbn7UekmyE7OZWIPAk8MFQDcB0s/FNSMq9B96F+0koEuKcvwTQ0tc8xHVS5bCjJ44RqZC2Dlaz m353yA== X-Google-Smtp-Source: AGHT+IHxQpWySsSSj+am/9movA5RR05zhmTuyQQ7kRFsu9/Thv/ZrVIRrKuBYgF71sqgdttaSpcNlA== X-Received: by 2002:a05:600c:1d01:b0:434:f335:855 with SMTP id 5b1f17b1804b1-4390d5a3b1amr46396125e9.28.1738839312912; Thu, 06 Feb 2025 02:55:12 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1e::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4390d9334e7sm50903345e9.6.2025.02.06.02.55.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 02:55:12 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kernel-team@meta.com Subject: [PATCH bpf-next v2 26/26] selftests/bpf: Add tests for rqspinlock Date: Thu, 6 Feb 2025 02:54:34 -0800 Message-ID: <20250206105435.2159977-27-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250206105435.2159977-1-memxor@gmail.com> References: <20250206105435.2159977-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15931; h=from:subject; bh=T/KW/xDA8UwHhiAJo3V+JzDoebizVQBXrYloT3/u9JY=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnpJRnrlHnn/tBJ6oJ5t/DtGyjE2XhsWglV3leICP3 GP9uxzCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ6SUZwAKCRBM4MiGSL8RygPCEA C9S+AZeaRDMKdILDKrL+7frf+PUkXKd4iu4zWS7lzhMl7sqS4Nh9t1j2+axKLBCR5GW16UadRkev9l 4AqI35PX5zJhmHuS98mojbAK0CpS2CGHXnEXEpFuaCyNsa1YcLtCp8cDsiMtko27s7eQKM+0JRXgV+ AHn3SqCKzv/56wYO8tQGN0xPhYGQyIQ358ZD+GyyR3NdWiQb7vmIK5qkZqGMpNOL1HQvR8gRQR6ptA 1PW4+UutQskXBQHy4BPCv1jc6WKr+5Dd2aa0c1PBdcGQ7fUZs9/wTKkNT+FDdNx7MlEYXg8Woa9RaX 14OIHH8WAYCA+BnP1Y20WdBptw0RU82vmXwQ5iwUyfjNyUeysFqk1dyT4ly8Wbg5CV4CXoBTsnyN6Y Sx5LJXb4UWDkzQm5GxqC7CXVXeFuV7ziqEu7dPwt7sqeT/WA9HEesT7Sc7e8S5gWuSULdoaCcKXMMz TxZWf652+WNHPkKtVodaoP4JvTJc8Vy2F+Li5/I7kMaVt2ZkqX18Qvf/4YXl2tJ/gWb6R9YYamNkwq EMJZ+zkvqv4Il+pSYLhiaoVY+mwAId1+LWNwv6b8UDTr5yVtRGYkxuTFGQWj7zgc3fxNJAX/9H6jHi OlLJLdnUrJZafds/4uVLPPoZVywAm415gF0IkMsXK0ujQlvIw6wx9VqTeeog== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce selftests that trigger AA, ABBA deadlocks, and test the edge case where the held locks table runs out of entries, since we then fallback to the timeout as the final line of defense. Also exercise verifier's AA detection where applicable. Signed-off-by: Kumar Kartikeya Dwivedi --- .../selftests/bpf/prog_tests/res_spin_lock.c | 99 +++++++ tools/testing/selftests/bpf/progs/irq.c | 53 ++++ .../selftests/bpf/progs/res_spin_lock.c | 143 ++++++++++ .../selftests/bpf/progs/res_spin_lock_fail.c | 244 ++++++++++++++++++ 4 files changed, 539 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c new file mode 100644 index 000000000000..5a46b3e4a842 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include + +#include "res_spin_lock.skel.h" +#include "res_spin_lock_fail.skel.h" + +static void test_res_spin_lock_failure(void) +{ + RUN_TESTS(res_spin_lock_fail); +} + +static volatile int skip; + +static void *spin_lock_thread(void *arg) +{ + int err, prog_fd = *(u32 *) arg; + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 10000, + ); + + while (!READ_ONCE(skip)) { + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_OK(topts.retval, "test_run retval"); + } + pthread_exit(arg); +} + +static void test_res_spin_lock_success(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct res_spin_lock *skel; + pthread_t thread_id[16]; + int prog_fd, i, err; + void *ret; + + skel = res_spin_lock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load")) + return; + /* AA deadlock */ + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + /* Multi-threaded ABBA deadlock. */ + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB); + for (i = 0; i < 16; i++) { + int err; + + err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); + if (!ASSERT_OK(err, "pthread_create")) + goto end; + } + + topts.repeat = 1000; + int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA); + while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) { + err = bpf_prog_test_run_opts(fd, &topts); + } + + WRITE_ONCE(skip, true); + + for (i = 0; i < 16; i++) { + if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) + goto end; + if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) + goto end; + } + + ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err"); + ASSERT_OK(err, "err"); + ASSERT_EQ(topts.retval, -EDEADLK, "timeout"); +end: + res_spin_lock__destroy(skel); + return; +} + +void test_res_spin_lock(void) +{ + if (test__start_subtest("res_spin_lock_success")) + test_res_spin_lock_success(); + if (test__start_subtest("res_spin_lock_failure")) + test_res_spin_lock_failure(); +} diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c index b0b53d980964..3d4fee83a5be 100644 --- a/tools/testing/selftests/bpf/progs/irq.c +++ b/tools/testing/selftests/bpf/progs/irq.c @@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym; extern void bpf_local_irq_restore(unsigned long *) __weak __ksym; extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym; +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + SEC("?tc") __failure __msg("arg#0 doesn't point to an irq flag on stack") int irq_save_bad_arg(struct __sk_buff *ctx) @@ -441,4 +444,54 @@ int irq_ooo_refs_array(struct __sk_buff *ctx) return 0; } +SEC("?tc") +__failure __msg("cannot restore irq state out of order") +int irq_ooo_lock_cond_inv(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) { + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; + } + + bpf_res_spin_unlock_irqrestore(&lockB, &flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags2); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_1(struct __sk_buff *ctx) +{ + unsigned long flags1; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + /* For now, bpf_local_irq_restore is not allowed in critical section, + * but this test ensures error will be caught with kfunc_class when it's + * opened up. Tested by temporarily permitting this kfunc in critical + * section. + */ + bpf_local_irq_restore(&flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_2(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + bpf_local_irq_save(&flags1); + if (bpf_res_spin_lock_irqsave(&lockA, &flags2)) + return 0; + bpf_local_irq_restore(&flags2); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c new file mode 100644 index 000000000000..f68aa2ccccc2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include "bpf_misc.h" + +#define EDEADLK 35 +#define ETIMEDOUT 110 + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 64); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + +SEC("tc") +int res_spin_lock_test(struct __sk_buff *ctx) +{ + struct arr_elem *elem1, *elem2; + int r; + + elem1 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem1) + return -1; + elem2 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem2) + return -1; + + r = bpf_res_spin_lock(&elem1->lock); + if (r) + return r; + if (!bpf_res_spin_lock(&elem2->lock)) { + bpf_res_spin_unlock(&elem2->lock); + bpf_res_spin_unlock(&elem1->lock); + return -1; + } + bpf_res_spin_unlock(&elem1->lock); + return 0; +} + +SEC("tc") +int res_spin_lock_test_AB(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockA); + if (r) + return !r; + /* Only unlock if we took the lock. */ + if (!bpf_res_spin_lock(&lockB)) + bpf_res_spin_unlock(&lockB); + bpf_res_spin_unlock(&lockA); + return 0; +} + +int err; + +SEC("tc") +int res_spin_lock_test_BA(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockB); + if (r) + return !r; + if (!bpf_res_spin_lock(&lockA)) + bpf_res_spin_unlock(&lockA); + else + err = -EDEADLK; + bpf_res_spin_unlock(&lockB); + return err ?: 0; +} + +SEC("tc") +int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx) +{ + struct bpf_res_spin_lock *locks[48] = {}; + struct arr_elem *e; + u64 time_beg, time; + int ret = 0, i; + + _Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 32, + "RES_NR_HELD assumed to be 32"); + + for (i = 0; i < 34; i++) { + int key = i; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + for (; i < 48; i++) { + int key = i - 2; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + time_beg = bpf_ktime_get_ns(); + for (i = 0; i < 34; i++) { + if (bpf_res_spin_lock(locks[i])) + goto end; + } + + /* Trigger AA, after exhausting entries in the held lock table. This + * time, only the timeout can save us, as AA detection won't succeed. + */ + if (!bpf_res_spin_lock(locks[34])) { + bpf_res_spin_unlock(locks[34]); + ret = 1; + goto end; + } + +end: + for (i = i - 1; i >= 0; i--) + bpf_res_spin_unlock(locks[i]); + time = bpf_ktime_get_ns() - time_beg; + /* Time spent should be easily above our limit (1/2 s), since AA + * detection won't be expedited due to lack of held lock entry. + */ + return ret ?: (time > 1000000000 / 2 ? 0 : 1); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c new file mode 100644 index 000000000000..3222e9283c78 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +long value; + +struct bpf_spin_lock lock __hidden SEC(".data.A"); +struct bpf_res_spin_lock res_lock __hidden SEC(".data.B"); + +SEC("?tc") +__failure __msg("point to map value or allocated object") +int res_spin_lock_arg(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff)); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock(&elem->lock); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_cond_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_local_irq_save(&f1); + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + if (bpf_res_spin_lock(&elem->lock)) { + bpf_res_spin_unlock(&res_lock); + return 0; + } + bpf_res_spin_unlock(&elem->lock); + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo_irq(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1, f2; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) { + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + /* We won't have a unreleased IRQ flag error here. */ + return 0; + } + bpf_res_spin_unlock_irqrestore(&elem->lock, &f2); + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1"); +struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2"); + +SEC("?tc") +__failure __msg("bpf_res_spin_unlock cannot be out of order") +int res_spin_lock_ooo_unlock(struct __sk_buff *ctx) +{ + if (bpf_res_spin_lock(&lock1)) + return 0; + if (bpf_res_spin_lock(&lock2)) { + bpf_res_spin_unlock(&lock1); + return 0; + } + bpf_res_spin_unlock(&lock1); + bpf_res_spin_unlock(&lock2); + return 0; +} + +SEC("?tc") +__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0") +int res_spin_lock_bad_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((void *)&elem->lock + 1); + return 0; +} + +SEC("?tc") +__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset") +int res_spin_lock_var_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + u64 val = value; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) { + // FIXME: Only inline assembly use in assert macro doesn't emit + // BTF definition. + bpf_throw(0); + return 0; + } + bpf_assert_range(val, 0, 40); + bpf_res_spin_lock((void *)&value + val); + return 0; +} + +SEC("?tc") +__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_map(struct __sk_buff *ctx) +{ + bpf_res_spin_lock((void *)&value + 1); + return 0; +} + +SEC("?tc") +__failure __msg("local 'kptr' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx) +{ + struct { int i; } *p = bpf_obj_new(typeof(*p)); + + if (!p) + return 0; + bpf_res_spin_lock((void *)p); + return 0; +} + +char _license[] SEC("license") = "GPL";