From patchwork Fri Nov 22 07:36:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13882830 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B5BCE65D2E for ; Fri, 22 Nov 2024 07:37:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E70226B0096; Fri, 22 Nov 2024 02:37:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E1FE86B00A2; Fri, 22 Nov 2024 02:37:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE7576B00AC; Fri, 22 Nov 2024 02:37:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B03006B0096 for ; Fri, 22 Nov 2024 02:37:20 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 63B1AAEBE3 for ; Fri, 22 Nov 2024 07:37:20 +0000 (UTC) X-FDA: 82812924936.29.64DE16A Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf19.hostedemail.com (Postfix) with ESMTP id 93DC91A0011 for ; Fri, 22 Nov 2024 07:36:10 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=e6pLHJle; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732260945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=bV7gLq0Dv5b58r83fWTxkWC1hdPvA2dZ9u+ZCszo9Js=; b=UteTCDzXlyZBwnsNv/MTF72z9VDJeYariyYLuzm4dh2IIoC0UnQsOYDabLtL6Xk/qu3vvp vdhHK5K2xIrseadZQcGUtyBXvtu+tcOhHx5R96lDmFHS8e4QXtwg+wYr6Cu98KQbcNf1r2 dUCwUFCnXHqLUGQ+d7y+8l7+48NPSpg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732260945; a=rsa-sha256; cv=none; b=bCCe2nWqeLF877is0Uvi6hUPTsp/kNxkoLrZ3oaRCbjlC3ucb88stm3I2PsfHJcOxT0CdX DEfReAybGErbYDVEo3e5TZZVw5Hg3JdrDVaF27bOV6GerrL7rF/+OTo0emiKqEsT11pJys yCcb7V0Jc876FMFQWoLlg6ioXcuX3ps= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=e6pLHJle; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-21207f0d949so16660125ad.2 for ; Thu, 21 Nov 2024 23:37:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1732261036; x=1732865836; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=bV7gLq0Dv5b58r83fWTxkWC1hdPvA2dZ9u+ZCszo9Js=; b=e6pLHJleAV+0t3BPJERKqaB5gBRK8PwMZFTvB/FS1DsEdR6DMk7YLVftCsyCRPJaYm XFfbUm3AjVHsvlTzAiUiPvtLTWMeQGXNxpfmiVI49mBuoNtP+QEQPw+Gq6tqxdd1BetX GeVoByBukJGk/kJEIcDdODz9pyfZXezC287j7j2XV0oKJ5lW5dHMbwSxtJ4PmRG1o1ma vIHuwD24cdOw0UmrddY6OvVBypMYL2UdMoCfCJk6oxBL4lR7oaVYJKgNnmjJ6DrAveQB A8T8r3eRQgOWpdy3bpbcJKxml1f1FMy0k1zzzs5z8j1ny+RatAjf3tPCKT2uMeyb10yh eelw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732261036; x=1732865836; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bV7gLq0Dv5b58r83fWTxkWC1hdPvA2dZ9u+ZCszo9Js=; b=Wpmm9C36VK6NA9ED1DM5Z32hr4i5ejliAFscIsaWd+7L9q5MYW8pU7zsh0fRuDbsba 4aRqRSdT/bIaubRigz3OY7qvENvosFZx8nZEZUn5+AOgwjppUk2CZw/SQBa9N7GL5Eh0 XVUy2GtHpwU5vzVL0uwkBuklQp77uJJaxWIgLqPA/WQgctHM2AnYNM2oqDewTDpFGUva C/gnA47+U0WOflWGDW6WMN4VbyCXUTvd+nqd1O9BTK0cREgOTJnoQo7iuURm6I2IBiXX saU4IGw1GXhTSzdfTPsojkAmrCMEHsyyyy1glH7DxudHKEDN/VjRj5kpUwP4FDw680Mv kGOg== X-Gm-Message-State: AOJu0Yyg3Uw6dgZHaDL+ZCMpHeHlyWOH9mZ85xCSL40khamFB6+3Zajl lJjb3jsB9ZcKeFmf/mrF0FUvdeSQCYvfjf9yCXLQxBbw/TS52Z06IGUmUA/seOs= X-Gm-Gg: ASbGncts/nmRmjjpAtqEZDbztkHlKrOoPInoizcjRdOmiiQ5bu85keM2sd9arVumG8i eV25YHs+LXuizUe9ZSpFxPeTecxwXeY95/458oQgEk8ehbJGEstZwXtwuNhkQZafyKtKmJXwYqm P5Y7EZhZfJ/Q23qVYQXwreot30ebxC5qgrlK0cWA6DkDELg/bY/RDx6V2QOk3ic5bmyUHkL8uOn 15HYxZA5UWnG0rUCxbszNT4Hx4x3UY7i3SLbQHtOtf+nl3piAsorOzknvXmwST+8ClRM0nX3HOz ZdQjZy5lRAOBNg== X-Google-Smtp-Source: AGHT+IEulizdJJtMylU5KZsgvUSMXtgdwGEw2Ghn/IeTkwaWvM2EMmiAvOUJ1bOFUVxdjfkLEjgd7A== X-Received: by 2002:a17:903:1c3:b0:212:51dc:3d51 with SMTP id d9443c01a7336-2129f56fb86mr19675655ad.27.1732261035313; Thu, 21 Nov 2024 23:37:15 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2129db8cf09sm10084205ad.4.2024.11.21.23.37.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Nov 2024 23:37:14 -0800 (PST) From: Qi Zheng To: pasha.tatashin@soleen.com, tongtiangen@huawei.com, jannh@google.com, lorenzo.stoakes@oracle.com, david@redhat.com, ryan.roberts@arm.com, peterx@redhat.com, jgg@ziepe.ca, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v2] mm: pgtable: make ptep_clear() non-atomic Date: Fri, 22 Nov 2024 15:36:52 +0800 Message-Id: <20241122073652.54030-1-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Stat-Signature: 3hmbbytfkzbpzmap666itcre4hpxxo1t X-Rspamd-Queue-Id: 93DC91A0011 X-Rspam-User: X-HE-Tag: 1732260970-380114 X-HE-Meta: U2FsdGVkX1/QmVilN74Xpr6sJbJekfYfE7qpcPN+LA5c8WE0MYMrFF2F2FnjvJif6OJ5hpTfvNSHuBuX+hgL5a2BFaRF/4+AsfglNOWfvJfbNtM34o3efaNeNU07sFaN59s1sjpaKlim/VOAQk+uFdSQA62b4tzzafWHxzx0MNSbdF0xmQ1uzIbOL/mU8Zh9AucmLlEXX1A6iQC9L1Hcl0JUVyaQheWhCNZKufd5i1DCROaCMRhMm+fratkymLVBQV4jw5jQa/qF4ZYyFttrs27CzlFQ71SaUvnedq4thSf0JP6sIozy1cWvKICoY8UYuQ+TnuWjxeYUKfdOUVU2QCAhU0nhGGuqZhGS/Zrn/4CFfE/s0phjPDFlP6HHTqsYB4t9cL4UQuQ3itXmvZPLfS+D+DjsxfJ2DY5S8pOh4QwE1ROA63MfPBgkLgBtVhH99gkaFzA4BWLoDfo2G1Ba3+Hjbus803h6Bby73wC/idziymEPXplAI+X8i1XI+TvQFnSeWQPdQ4dIRPWCQhTXC/FEHXxWlByYqIA/EXMawQ1aFEDQHghk4hrj4NYiDQMsfEjfM/0zqSuxS41mNXvc3r77tST0uFZdfpea5pHRP6wkW+CsCXS+b3XWFPR0wVZFvREXakZeIxeY47OqJkb1w27VOi0fAODECxHsCY+0mxJ7X95x11FVTIH3Az6WQAZ9iG5tJdaFc6aVvwzxuO4nG94vHiPSla8Wb/kMfiS171DX5UJqmZ+MPgN6RsUn//3+7uYq5+4dCV5rpiG5N7jzrxbznCm0fYWrEjSqfWsiFpyGCGY4fSdOyhNQ136+BGvVAVWTLsn7jE0SbYU41jP6hZymbdmmfdg4UsbfsS8H7zjD/wMjjy0w1ajO29fym1FPMd4bY7OHEmdnf/+bcYpLtxWqqeZ6S1V0nQFojX23F0EzySzzdTTz+MtMSBVfE1AZi+KBTlFlCz3e3eD2CNH k9YeG5zQ 1tWcsGpOo8WPwk2LMmLtvV6Ur1FlUxHdAM+fhriSTr9Ap3yII2Dog7UfzLx+sQUq0SLfcXKzwD0xQH6SUxS5kl4jmVT2EV3erXavJxF3o8ZkbQbG7P4ZkVFjxoSKKuUjysMtMwWm7Uh98RjGf+FoPiDMZHkyFG0OBVAYg+4Soz9iPmRMIxzLBj8pXIpBVqZJboJWvSNeKU36zSOmTfl8gTT7QDwPADriwftkhIri1vAWxNWeZgTMhIYo0ucUn7Y2IplFDr410jplAuEZ/cndSCG9mZ6+qv3KOd9qBUR2xUvl3tLtZFdUy48Oh2/b0oDEohAUJ/RSPL7dri3JWCpoLqDqtiqaJygJI6C/h/HrWmqncHSnb5Vx+4xsO/LyrZxdw3m+UtJxkQzH8nU3ouoYsg6C6Tcaf/Q9auw6t5lnuUln8+2KoZqJ6u0e1rUgGxZA1TVYb8UKZmWrvofGHz8umnUuYNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the generic ptep_get_and_clear() implementation, it is just a simple combination of ptep_get() and pte_clear(). But for some architectures (such as x86 and arm64, etc), the hardware will modify the A/D bits of the page table entry, so the ptep_get_and_clear() needs to be overwritten and implemented as an atomic operation to avoid contention, which has a performance cost. The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") adds the ptep_clear() on the x86, and makes it call ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page table check feature does not actually care about the A/D bits, so only ptep_get() + pte_clear() should be called. But considering that the page table check is a debug option, this should not have much of an impact. But then the commit de8c8e52836d ("mm: page_table_check: add hooks to public helpers") changed ptep_clear() to unconditionally call ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be put into the page table check stubs (in include/linux/page_table_check.h). This also cause performance loss to the kernel without CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense. Currently ptep_clear() is only used in debug code and in khugepaged collapse paths, which are fairly expensive. So the cost of an extra atomic RMW operation does not matter. But this may be used for other paths in the future. After all, for the present pte entry, we need to call ptep_clear() instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly. So to be more precise, just calling ptep_get() and pte_clear() in the ptep_clear(). Signed-off-by: Qi Zheng Reviewed-by: Pasha Tatashin Reviewed-by: Jann Horn Reviewed-by: Muchun Song Acked-by: David Hildenbrand --- Changes in v2: - add a comment (suggested by David Hildenbrand) - collect Reviewed-bys and Acked-by include/linux/pgtable.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index adef9d6e9b1ba..94d267d02372e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -533,7 +533,14 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma, static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - ptep_get_and_clear(mm, addr, ptep); + pte_t pte = ptep_get(ptep); + + pte_clear(mm, addr, ptep); + /* + * No need for ptep_get_and_clear(): page table check doesn't care about + * any bits that could have been set by HW concurrently. + */ + page_table_check_pte_clear(mm, pte); } #ifdef CONFIG_GUP_GET_PXX_LOW_HIGH