From patchwork Wed Aug 21 08:18:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13770984 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5253EC531DC for ; Wed, 21 Aug 2024 08:19:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAD786B00BF; Wed, 21 Aug 2024 04:19:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5D2E6B00C0; Wed, 21 Aug 2024 04:19:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFE2D6B00C1; Wed, 21 Aug 2024 04:19:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 903506B00BF for ; Wed, 21 Aug 2024 04:19:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 45DAC809EF for ; Wed, 21 Aug 2024 08:19:36 +0000 (UTC) X-FDA: 82475553552.17.7871E98 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf03.hostedemail.com (Postfix) with ESMTP id 5E57C20006 for ; Wed, 21 Aug 2024 08:19:34 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=c5QWJmhS; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724228268; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VgE/WG98IUTB5tF1fVgrlRXpW1wPHqZ1M9xizEuPG+g=; b=Qm4UMX3tRMW4JhMPoqPc4+QN6OyJ72IxJhPbUlox3VS8ix5WtxsBF1RK+FIUv9UKOvpbo7 nI6tbsocrOShZTpZX5LNe9v5jHlrc1ukxeXnVmxNgHgHXugsx2TThtiPaTpZJGDCHGBEXv kwApOT8l0JbFcDwrxYL5AxMxqmE27lo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=c5QWJmhS; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724228268; a=rsa-sha256; cv=none; b=zWzcod7OGSI/EJxVjg8XusEVi6ZYbFG4VaJwUo3QO9T9Buo2i7M0a1f0NL9rKOFuHl4f2Y aYm8aqihvCRygKrsM88pAFbQEXWIvzwNA78BobEkzL/0v3UPJzl7/bmt5WChMXdjqoYggf u9oNkHjN3+9n9+jgv0TklDGHjqVhGu8= Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-7141feed424so768680b3a.2 for ; Wed, 21 Aug 2024 01:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724228373; x=1724833173; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VgE/WG98IUTB5tF1fVgrlRXpW1wPHqZ1M9xizEuPG+g=; b=c5QWJmhSwy4y5T2pcmfA+a37d6fFAuvVUYdJ7Kk0qDgmxmqVOmAPdLp59VDUtAClt5 1MX5zYnXBUgLcwB2u4187d+vVm0jn4w8fPdxAJMjxm+xv8NsO96NmRmxfMnwJaL4cBM2 DLKpfI4d5SBTY87/HjIz35tDEfQp0rbFOTKd1RGfZoofHkYVeUalT5wyR+5O+FLBt2SY rvWqo4MlOFMsc7HupdUG34fj8Sb0V9PXBHVfQHNleBZbHWcmr1MWpfcHtyrdGWTsz4JQ aLk0XB5Q3zvoEnn60iTa1o6FpYS7YTDRWBWEB7gi1Qzpuc+XVxzgJJFz1+BtLdG5zPRn Avxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724228373; x=1724833173; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VgE/WG98IUTB5tF1fVgrlRXpW1wPHqZ1M9xizEuPG+g=; b=LDQj6jaznaSnOqLdQNi0575b2678d4Ao6IabBQxIc80ndayd2gpUgsxYkMGENAyapQ T4ZCiIl9xBWegAXgFWx1/0c8GlsJ/SzUhnVJ4Id1l/zflHZepidiL1rFTEWdsR36lZqB uzxGUce2ZeHa/HaQlCzi5ZyF6pkfjWv23onZXAVQtsmyfx++u7nMgXLhR12cBtzKDKcM kia23qkqnJhDBAwos6KMJK0lqXUZkri2QNUDMDd9I/EO7xNCDZZ4xyUTGlNDK9YziPXz iHl2r8K6Q0kTr/7rdXm82KuoYdMOQi8NyE02zq7FdtAmmdiqrjeTI5jZv/vMdyW8LPxA YP3w== X-Forwarded-Encrypted: i=1; AJvYcCWa5O/RGV4j5ml/6U6n5QQC5B5ohIosddgrLbcA0WKi5li2MdMbsuGkfx8X3SMCdd1V1Z6O83Ta+g==@kvack.org X-Gm-Message-State: AOJu0YyAENGGnRFOAO41yvls+vX6o0xbMbiMA5whttHdTEhue0k8Fd2W +z9ThzXrX7cOf0mW1CQoBzaf4xuMEMGsnVBQu5yIiEnGyuqFJb/mdnwXblMp4bM= X-Google-Smtp-Source: AGHT+IF11KqU9EwWDBxqOuV+x8Or0DtjlDyZMn8HSaNUPAUvyVetfGjOkbkYFHx43yYUuJqRZJ73CA== X-Received: by 2002:a05:6a21:2986:b0:1c0:e49a:6900 with SMTP id adf61e73a8af0-1cad80ef085mr1849540637.7.1724228372801; Wed, 21 Aug 2024 01:19:32 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.150]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d5eb9049b0sm1091453a91.17.2024.08.21.01.19.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 01:19:32 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, rppt@kernel.org, vishal.moola@gmail.com, peterx@redhat.com, ryan.roberts@arm.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, Qi Zheng Subject: [PATCH 01/14] mm: pgtable: introduce pte_offset_map_{readonly|maywrite}_nolock() Date: Wed, 21 Aug 2024 16:18:44 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5E57C20006 X-Stat-Signature: hoc4ug5njziuu4jbz9hajycwgosx7des X-Rspam-User: X-HE-Tag: 1724228374-685525 X-HE-Meta: U2FsdGVkX19ebyhSckfBW0C3QMaZBnyWVmW4IW0TbyhcZFfi75uowiN0nvedx04Jfqk19ld/sQR5IN1z397QSftHoEw0NHF/fp4qcG2GEvAwP7xzQHcmczeXXcsu5d8en7sdMpHiqXx/IKakqKPRSWDscsvNITbtEV5N1FxIEsYMU1rztssnTs0YuLYtP1shgLcP8uEGoC2TjMYVMF8uBsz3XEt1x33Ea0SnI+5yZj/M70tbTylHnkR0vo9J91ZjA+ajT3OwBr0JX36mwZ6o7qJWA3ZQGz4k47J/xn4faN5iksp7t2zlONmdfVzfBLvT7W49tCN3akevLDNCNWBvvZNJwfZJCH2C/wes4wxRFzUl+0FV4fEOncs+VNtaBKtMs2CfY1C75nAglIbm/86oyPN9XKMC2TcP4DXn5v9XOnrwRYRTtypz9oFjWhtjWS9E2sWjJ7/9t2ZWOxsdPdzIXwY8RwRBXAXF2baRuEBmqK/tZ2SW5V515j3fdmjyFycfFa/0TDKMbGP2taXvWRKcN4PaSloshCYCcBZq8k2KuGbFA1qiYL1+gcE8Gz+rUy/Jn7Gg+cpXVeqi09LPcBeH23sG45yIS1M11aL/a4RvgHAdB9eFCdU+DQpwMN4kC0WPu1vYjfaJifEKHbAy1lsLMBTZyI+P6y08hWWymeaIjPsgjsCnSGftUNmnNR4KxPncR9X12siHRpM+eYGYemiWAfXlBHblNASqPNYILvsja1DtXin0+r0y53JWfwU0at4kH4qQsVMz3r4rO3omKh4w63aG8hjHB9SFXE8ran1ZkJATNlhA58bDvAYWqf1Tvm7Bj1ub7ch1DPFQp37qWLCsNaLfiZYKwAd4CmBxfqy9tzkQo8PJFzlxC99xFSzrWhfaOi8fjW0gl7hRTtwUUwstjZonTIqCbx56Y/e3WaFKaAr06iFwfKl76++OLiuoHwwf/95aL0Zmpui4Iy1/QOf 7comXfIb DwDrDVgqTliGksHQ5Vb09CoQKvoHrXbDbVdwaHMwVJ5a5D/IvOjq34B4jwdi590ODgt4MYJ+YXuTAGgmOG3bwp9Lk1VEvhlGNsTUi9tPRZlq90MDJn6AW+b6SYCvX7/QAA5jvnvnG6FfY16kuHMCK2PSjslmzIoTwhIFHBSj0Cx5h5qndeJK4CssNxSBR2YYrAJnU9pg496YZgn94Fe2RQjDocXNLzWyYYmsl+947bECl9B7m4ekkNE29xGtL2cgVy5lrHvbgGpRcyRdHXqIEsQwIH29QEagWeUypZa5+a8/nZgDRzXPM+72vI+SWNZge0KOqc2STNihpy3L4TaSkGsob3KSl5ZTllcEGmtk3gI4a/qpEXdMrX2e9h4LtWzS3avPpiEYkA9VeZPTdjUY+/bhkWFz0Mfcm1jotRY+zRuW1i/jySSp5gjhx9uHWHydcapNX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, the usage of pte_offset_map_nolock() can be divided into the following two cases: 1) After acquiring PTL, only read-only operations are performed on the PTE page. In this case, the RCU lock in pte_offset_map_nolock() will ensure that the PTE page will not be freed, and there is no need to worry about whether the pmd entry is modified. 2) After acquiring PTL, the pte or pmd entries may be modified. At this time, we need to ensure that the pmd entry has not been modified concurrently. To more clearing distinguish between these two cases, this commit introduces two new helper functions to replace pte_offset_map_nolock(). For 1), just rename it to pte_offset_map_readonly_nolock(). For 2), in addition to changing the name to pte_offset_map_maywrite_nolock(), it also outputs the pmdval when successful. This can help the caller recheck *pmd once the PTL is taken. In some cases we can pass NULL to pmdvalp: either the mmap_lock for write, or pte_same() check on contents, is also enough to ensure that the pmd entry is stable. Subsequent commits will convert pte_offset_map_nolock() into the above two functions one by one, and finally completely delete it. Signed-off-by: Qi Zheng --- Documentation/mm/split_page_table_lock.rst | 7 ++++ include/linux/mm.h | 5 +++ mm/pgtable-generic.c | 43 ++++++++++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index e4f6972eb6c04..f54f717ae8bdf 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -19,6 +19,13 @@ There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_nolock() maps PTE, returns pointer to PTE with pointer to its PTE table lock (not taken), or returns NULL if no PTE table; + - pte_offset_map_readonly_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map_maywrite_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken) and the value of its pmd entry, or returns NULL + if no PTE table; - pte_offset_map() maps PTE, returns pointer to PTE, or returns NULL if no PTE table; - pte_unmap() diff --git a/include/linux/mm.h b/include/linux/mm.h index 00501f85f45f0..1fe0ceabcaf39 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2954,6 +2954,11 @@ static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, spinlock_t **ptlp); +pte_t *pte_offset_map_readonly_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); +pte_t *pte_offset_map_maywrite_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, pmd_t *pmdvalp, + spinlock_t **ptlp); #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711ac..29d1fd6fd2963 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -317,6 +317,33 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, return pte; } +pte_t *pte_offset_map_readonly_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte = __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp = pte_lockptr(mm, &pmdval); + return pte; +} + +pte_t *pte_offset_map_maywrite_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, pmd_t *pmdvalp, + spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte = __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp = pte_lockptr(mm, &pmdval); + if (pmdvalp) + *pmdvalp = pmdval; + return pte; +} + /* * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation * __pte_offset_map_lock() below, is usually called with the pmd pointer for @@ -356,6 +383,22 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, * recheck *pmd once the lock is taken; in practice, no callsite needs that - * either the mmap_lock for write, or pte_same() check on contents, is enough. * + * pte_offset_map_readonly_nolock(mm, pmd, addr, ptlp), above, is like + * pte_offset_map(); but when successful, it also outputs a pointer to the + * spinlock in ptlp - as pte_offset_map_lock() does, but in this case without + * locking it. This helps the caller to avoid a later pte_lockptr(mm, *pmd), + * which might by that time act on a changed *pmd: pte_offset_map_readonly_nolock() + * provides the correct spinlock pointer for the page table that it returns. + * For readonly case, the caller does not need to recheck *pmd after the lock is + * taken, because the RCU lock will ensure that the PTE page will not be freed. + * + * pte_offset_map_maywrite_nolock(mm, pmd, addr, pmdvalp, ptlp), above, is like + * pte_offset_map_readonly_nolock(); but when successful, it also outputs the + * pdmval. For cases where pte or pmd entries may be modified, that is, maywrite + * case, this can help the caller recheck *pmd once the lock is taken. In some + * cases we can pass NULL to pmdvalp: either the mmap_lock for write, or + * pte_same() check on contents, is also enough. + * * Note that free_pgtables(), used after unmapping detached vmas, or when * exiting the whole mm, does not take page table lock before freeing a page * table, and may not use RCU at all: "outsiders" like khugepaged should avoid