From patchwork Fri Oct 14 23:57:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78D25C4332F for ; Fri, 14 Oct 2022 23:57:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8BA6B0075; Fri, 14 Oct 2022 19:57:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA7EC6B0078; Fri, 14 Oct 2022 19:57:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C97566B007B; Fri, 14 Oct 2022 19:57:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BA64F6B0075 for ; Fri, 14 Oct 2022 19:57:06 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8FF321A0E93 for ; Fri, 14 Oct 2022 23:57:06 +0000 (UTC) X-FDA: 80021218452.27.65272BD Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf23.hostedemail.com (Postfix) with ESMTP id 10280140031 for ; Fri, 14 Oct 2022 23:57:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791825; x=1697327825; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1QVmMXmTNKFM/iXiTwxmy31zTJO5VhS550m6cgY6v88=; b=bEvhbEFxWaSeD9vgae/qJh7ObisSz8sQKxaj6L0siWnlg1RlUJwbGRpG Vev7Y6Xdvophq7YTfFXAl0H3io6VSmonB2jxEKWSWyrGLJfoOFnBJIc5t yk7Adq7HUefSTHsMDKw4iajjR6lCTrOMd3/gTfuXMcyhrOmSJHUov66JE U3hGvJYYteWpjkXyI65ghtqIZH8lQQU7jXOLTbQ49IKI5JolwHsnhxGqW vjUX0l8Zg506XpoXEnObWvh8+8czNunFN5RgtLnyMEI46WIDn2A2vyqTL cIJ9beso+bT/KbNNhev5o98444IXD7nRvSIEt4wohy9JmD5IyTDxtglUd w==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292861854" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292861854" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:03 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="658759494" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="658759494" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:03 -0700 Subject: [PATCH v3 01/25] fsdax: Wait on @page not @page->_refcount From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:02 -0700 Message-ID: <166579182271.2236710.15120970389485390592.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791825; a=rsa-sha256; cv=none; b=whE/PevC1r8JzejMS3EX3TGUbkYkI6VnvhuxxoW1Erog+d//9eKyPlytOTdJGVhNK2q0SB edJlGqakSiUG6ekw1IGoYdvstb9ps23vENn1ooDACefC/WvlNjqxL5q11k4NvNOBcr2f72 68dRj5e4ufAaSn4oUazEX39B9yAB1D8= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bEvhbEFx; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791825; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KM+0eeMhdrxpX2Cxgw0JcB3bULUGDOj2MnI/iJJFYwE=; b=58p4g8i0svA/tTx8/JXEXR8+7ebDZYMRMX4ZlY+5sVFsdQck3Teu4JLwXo/ZJJwqbufvLF mCjROq3XZThyzKZNRcmcXNP/h4VNS59I+AaxtikP/jZtInDne7fpE2TF7SqJx20UW9g7fH 2DgUjoo8FOCTKSTXAKmvFFdIFOnkiuk= X-Rspam-User: X-Stat-Signature: e6ewd4eqkh65zu3jbzxqtb67nidi9x9f X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 10280140031 Authentication-Results: imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bEvhbEFx; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1665791824-813338 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The __wait_var_event facility calculates a wait queue from a hash of the address of the variable being passed. Use the @page argument directly as it is less to type and is the object that is being waited upon. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Reviewed-by: Jason Gunthorpe Signed-off-by: Dan Williams --- fs/ext4/inode.c | 8 ++++---- fs/fuse/dax.c | 6 +++--- fs/xfs/xfs_file.c | 6 +++--- mm/memremap.c | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 601214453c3a..b028a4413bea 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3961,10 +3961,10 @@ int ext4_break_layouts(struct inode *inode) if (!page) return 0; - error = ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, - TASK_INTERRUPTIBLE, 0, 0, - ext4_wait_dax_page(inode)); + error = ___wait_var_event(page, + atomic_read(&page->_refcount) == 1, + TASK_INTERRUPTIBLE, 0, 0, + ext4_wait_dax_page(inode)); } while (error == 0); return error; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index e23e802a8013..4e12108c68af 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -676,9 +676,9 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, return 0; *retry = true; - return ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, fuse_wait_dax_page(inode)); + return ___wait_var_event(page, atomic_read(&page->_refcount) == 1, + TASK_INTERRUPTIBLE, 0, 0, + fuse_wait_dax_page(inode)); } /* dmap_end == 0 leads to unmapping of whole file */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index c6c80265c0b2..73e7b7ec0a4c 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -827,9 +827,9 @@ xfs_break_dax_layouts( return 0; *retry = true; - return ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, xfs_wait_dax_page(inode)); + return ___wait_var_event(page, atomic_read(&page->_refcount) == 1, + TASK_INTERRUPTIBLE, 0, 0, + xfs_wait_dax_page(inode)); } int diff --git a/mm/memremap.c b/mm/memremap.c index 421bec3a29ee..f9287babb3ce 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -542,7 +542,7 @@ bool __put_devmap_managed_page_refs(struct page *page, int refs) * stable because nobody holds a reference on the page. */ if (page_ref_sub_return(page, refs) == 1) - wake_up_var(&page->_refcount); + wake_up_var(page); return true; } EXPORT_SYMBOL(__put_devmap_managed_page_refs); From patchwork Fri Oct 14 23:57:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007516 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FC33C43219 for ; Fri, 14 Oct 2022 23:57:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87B1C6B0078; Fri, 14 Oct 2022 19:57:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 802C46B007B; Fri, 14 Oct 2022 19:57:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A37D6B007D; Fri, 14 Oct 2022 19:57:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 56B756B0078 for ; Fri, 14 Oct 2022 19:57:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2B5BAAAC1A for ; Fri, 14 Oct 2022 23:57:11 +0000 (UTC) X-FDA: 80021218662.04.BD89596 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf02.hostedemail.com (Postfix) with ESMTP id 86D7A8002D for ; Fri, 14 Oct 2022 23:57:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791830; x=1697327830; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6sp/aip8YX3p0mDPRXhTTeKlVhlMlg1fVCnBOk4kgao=; b=Q+DNsdKqNuv3iqYYzthmU1OOIGvjCALOTpuLEa4n/BjYpEAYt2odnnCJ ejggDuWgFXGmiuOiiMtC+u7eR00L9BOqC8LsSq1EtThklzrxGY+dS+A+F DBvrxQ5RLHoMJ0pT8l2ROS/FQIRGgBx6991tIwPjYPdj7LF9faKS2qcRZ eGk3cYtFkhsOVF3tlWIv8tYGlR3JGQtkv0qkfflMIPblBVmDcqYIIGAyy s0vm3ftkgSpQR2r74KFHDdVtZJ05N1HTlekUB2V0/uwY3zg7ANptL4X3z JdcHnAO2QbJEv2l8ASD8GXi9biZepshCHMJI+UsS33lMCCyLhYVY3p5Zh g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="288791270" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="288791270" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:09 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="658759517" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="658759517" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:08 -0700 Subject: [PATCH v3 02/25] fsdax: Use dax_page_idle() to document DAX busy page checking From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:08 -0700 Message-ID: <166579182839.2236710.16461867548859813784.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791830; a=rsa-sha256; cv=none; b=NIlfvAijmHNsf1wy7jrPHMzA8jpKyvZ44L0vQowe7EHngDKzvPgZemn2lcay2vcdYF37nG D7/UZmNE9Lwgvc6/sAb1kyk4E9XRMAod5FaCwM+o+GrKT9lWbu1Zud/nKjOkj8oEAK9fvb kHkmaZoaqvGXdbQIwLlgeGUEZAP3xEc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Q+DNsdKq; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qltmtl13L9YzhKVNR4mGei4zaJZrqJ/gmtQ9tTR/l58=; b=K7MU7OilPLofwPSMbK0wgDHFvxdVvl/XiCaHnAlAiOGrFwgOaFJTNobKc6gvrgZEeeBp0n BbauN2hRcnsA0AxyPPUtvusTl+w+ook6ke8L8HMUvi3DsjZS3AsXplB+kJspQyClafAZAi J1c0BTklsastUpY03bmAXoX30hTBnKM= X-Stat-Signature: i6omgg6ssh1145tyuexzo1q8n64ddnij X-Rspamd-Queue-Id: 86D7A8002D Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Q+DNsdKq; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1665791830-517805 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In advance of converting DAX pages to be 0-based, use a new dax_page_idle() helper to both simplify that future conversion, but also document all the kernel locations that are watching for DAX page idle events. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Reviewed-by: Jason Gunthorpe Signed-off-by: Dan Williams --- fs/dax.c | 4 ++-- fs/ext4/inode.c | 3 +-- fs/fuse/dax.c | 5 ++--- fs/xfs/xfs_file.c | 5 ++--- include/linux/dax.h | 9 +++++++++ 5 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index c440dcef4b1b..e762b9c04fb4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -395,7 +395,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(trunc && page_ref_count(page) > 1); + WARN_ON_ONCE(trunc && !dax_page_idle(page)); if (dax_mapping_is_cow(page->mapping)) { /* keep the CoW flag if this page is still shared */ if (page->index-- > 0) @@ -414,7 +414,7 @@ static struct page *dax_busy_page(void *entry) for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - if (page_ref_count(page) > 1) + if (!dax_page_idle(page)) return page; } return NULL; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b028a4413bea..478ec6bc0935 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3961,8 +3961,7 @@ int ext4_break_layouts(struct inode *inode) if (!page) return 0; - error = ___wait_var_event(page, - atomic_read(&page->_refcount) == 1, + error = ___wait_var_event(page, dax_page_idle(page), TASK_INTERRUPTIBLE, 0, 0, ext4_wait_dax_page(inode)); } while (error == 0); diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 4e12108c68af..ae52ef7dbabe 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -676,9 +676,8 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, return 0; *retry = true; - return ___wait_var_event(page, atomic_read(&page->_refcount) == 1, - TASK_INTERRUPTIBLE, 0, 0, - fuse_wait_dax_page(inode)); + return ___wait_var_event(page, dax_page_idle(page), TASK_INTERRUPTIBLE, + 0, 0, fuse_wait_dax_page(inode)); } /* dmap_end == 0 leads to unmapping of whole file */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 73e7b7ec0a4c..556e28d06788 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -827,9 +827,8 @@ xfs_break_dax_layouts( return 0; *retry = true; - return ___wait_var_event(page, atomic_read(&page->_refcount) == 1, - TASK_INTERRUPTIBLE, 0, 0, - xfs_wait_dax_page(inode)); + return ___wait_var_event(page, dax_page_idle(page), TASK_INTERRUPTIBLE, + 0, 0, xfs_wait_dax_page(inode)); } int diff --git a/include/linux/dax.h b/include/linux/dax.h index ba985333e26b..04987d14d7e0 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -210,6 +210,15 @@ int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, const struct iomap_ops *ops); +/* + * Document all the code locations that want know when a dax page is + * unreferenced. + */ +static inline bool dax_page_idle(struct page *page) +{ + return page_ref_count(page) == 1; +} + #if IS_ENABLED(CONFIG_DAX) int dax_read_lock(void); void dax_read_unlock(int id); From patchwork Fri Oct 14 23:57:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29ABAC433FE for ; Fri, 14 Oct 2022 23:57:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B14DA6B007B; Fri, 14 Oct 2022 19:57:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC22A6B007D; Fri, 14 Oct 2022 19:57:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 989FC6B007E; Fri, 14 Oct 2022 19:57:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 84F926B007B for ; Fri, 14 Oct 2022 19:57:16 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 45D5280BB1 for ; Fri, 14 Oct 2022 23:57:16 +0000 (UTC) X-FDA: 80021218872.16.C74CD36 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf28.hostedemail.com (Postfix) with ESMTP id C46BBC0025 for ; Fri, 14 Oct 2022 23:57:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791835; x=1697327835; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3wvSjSkW/zXAYAgI4UeAUQfAuIeQwEc+TBBp6c7MzGs=; b=OdrGxEFdl9Cn+s8dQP8NcueISs1haXuhIkQh1PGmttSE0Ey8mvswYoqO tma2NziZrQv65+HhRfVodLAinj8bzjPWddO1jfrtz6jnNbels+PEQSj/U Ru+deuIfnlVdDC28XnirFrPxHw56tEwcmhcFPOrJWDw1e7qItx/xi9mGy 2MCnzMqy7GJC+WMMqalfSFg4Xh2PZWaCbLHZLzsIXnqc3+UTBaLzSMgE4 bTxTae0ALhCPnWHCsPbJ3ld4ROAl/Gx5TJeXOl9ZB4u/t/US/vr1tc0Gb trHdBuOH7B7O/kkv2IPjcUhM3mGabUg0aNMlNYxu/QmKEkXauelFlXpiG Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292861893" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292861893" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:14 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="658759536" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="658759536" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:14 -0700 Subject: [PATCH v3 03/25] fsdax: Include unmapped inodes for page-idle detection From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:14 -0700 Message-ID: <166579183407.2236710.2161472385123143060.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791836; a=rsa-sha256; cv=none; b=7VgZGdndg2qwRMJ3KUNhSg3JOSOSdbgbNhdO1eyrwW6LfTcaFHfIh5Ayyf9sCiEelhBV/Q npnjy5UcgUG4fY7tEi92rSEXTNbfWfkJ1qYW5XKKrjFDb0geUaCr4pzqtQhvwETnvDA7dq eGV0srawo7+WdY7LFxajiBnQKbarcEc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=OdrGxEFd; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mg4sUw8CPGU23UE7zMKNYtu26deQnjSIF5al9lMmnVE=; b=ZVCIEVeZdFB4Zz+gklVmdRhV/3ZFRsxFQo586iPPcRHbYe0psxUBjzttrgy5ZBtxlT/kZC N342QzG5xUYo/v3lK+00X9s98ExR8RT9s3DDKlOolkJIQwj+iS5sk1UGDLGhfHv8LVXC9L 3FbnjKdIgM9bt/dRgN+mt+pPJAmS6Go= X-Rspam-User: X-Stat-Signature: rauhy69bmks693en3gn5jwgxbqjmqh8d X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C46BBC0025 Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=OdrGxEFd; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1665791835-222430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A page can remain pinned even after it has been unmapped from userspace / removed from the rmap. In advance of requiring that all dax_insert_entry() events are followed up 'break layouts' before a truncate event, make sure that 'break layouts' can find unmapped entries. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dax.c b/fs/dax.c index e762b9c04fb4..76bad1c095c0 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -698,7 +698,7 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping, if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return NULL; - if (!dax_mapping(mapping) || !mapping_mapped(mapping)) + if (!dax_mapping(mapping)) return NULL; /* If end == LLONG_MAX, all pages from start to till end of file */ From patchwork Fri Oct 14 23:57:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21E4FC433FE for ; Fri, 14 Oct 2022 23:57:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A75456B0072; Fri, 14 Oct 2022 19:57:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A25836B007D; Fri, 14 Oct 2022 19:57:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EDC66B007E; Fri, 14 Oct 2022 19:57:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7B1366B0072 for ; Fri, 14 Oct 2022 19:57:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 431321C5DE3 for ; Fri, 14 Oct 2022 23:57:23 +0000 (UTC) X-FDA: 80021219166.14.76AAE27 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf07.hostedemail.com (Postfix) with ESMTP id 5BC5640027 for ; Fri, 14 Oct 2022 23:57:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791842; x=1697327842; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=00TuoeEs7LX9ow5sfsY7OGmlVaqdx+lllPmKBKn3E9Y=; b=KbSHAbUyZ+9nVozfE4tpzAWsPuW4rQDsLmrDWkprhOLAs9qZdrE27vIZ on6X6wOA/RxpCiiO/nEvbRaCbEP+u3O8wlC4F4ToBz2CVlnn360ygyF9E 16xIYilxyoZ6nNmrZ2az7hStjAjTEPT5GrEG8NwkVOi4nlJGcle83wYyQ nP2oL57bCDAKoddxVrMw8zyhDgjmWpeLq5/m6tidok6EFHB93oTMSIB4E alw+DXzkK21o75BN3ADlVXktRIr73f8erdPCVyeMgnv76Yl2cAytKUfbc QBmXHd50EAkgjPxVdkp229vA+WuFL7nqsc8CXhYXPBJuEEc00L8zjFqK9 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="332018519" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="332018519" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:20 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="658759562" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="658759562" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:20 -0700 Subject: [PATCH v3 04/25] fsdax: Introduce dax_zap_mappings() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:19 -0700 Message-ID: <166579183976.2236710.17370760087488536715.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791842; a=rsa-sha256; cv=none; b=strE68XStAWB90h9xj7noXu8QcFffrE26yPtQL2/wzGXu52Zxpj0I1C8QQmc+egfjUg7NZ palWpdq2+16tx0LygnyfXCUNCPfTl5/zJ+FkkcB8myyza5qusgmh/eLC2oxwCKCKc/Wcot /z9MbYNgbPOcAv+mp/NI1LAld0WdEcM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=KbSHAbUy; spf=pass (imf07.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t7ObIFodp5ZuHWUIxZ0FKMbQhSAIDX2bfeWaZ/ceLd8=; b=frgu1zamfa/eOxaav4Bfh8q7m72kmztMLcDbhFiOPbFuP4AFG+Rx5pnadxTue/74/b1tsI K2fhPm2pM1TYSDystqlQXjoeRBQWT2k3PCjQ5ckhWGTRZQtb/L5XoLtG3o218IN0eM9Lca v2Z9urx3Ehd9rS29qsYDk/vPx5nmwqI= X-Stat-Signature: 7n373dworzsjzobowwdpgxb6pk6twezm X-Rspamd-Queue-Id: 5BC5640027 X-Rspam-User: X-Rspamd-Server: rspam08 Authentication-Results: imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=KbSHAbUy; spf=pass (imf07.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1665791842-953574 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Typical pages take a reference at pte insertion and drop it at zap_pte_range() time. That reference management is missing for DAX leading to a situation where DAX pages are mapped in user page tables, but are not referenced. Once fsdax decides it wants to unmap the page it can drop its reference, but unlike typical pages it needs to maintain the association of the page to the inode that arbitrated the access in the first instance. It maintains that association until explicit truncate(), or the implicit truncate() that occurs at inode death, truncate_inode_pages_final(). The zapped state tracks whether the fsdax has dropped its interest in a page, but still allows the associated i_pages entry to live until truncate. This facilitates inode lookup while awaiting any page pin users to drop their pins. For example, if memory_failure() is triggered on the page after it has been unmapped, but before it has been truncated from the inode, memory_failure() can still associate the event with the inode. Once truncate begins fsdax unmaps the page to prevent any new references from being taken without calling back into fsdax core to reestablish the mapping. This approach relies on all paths that call truncate_inode_pages() to first call dax_zap_mappings(). For that another bandaid is needed to add this 'zap' step to the truncate_inode_pages_final() path, but that is saved for a follow-on patch. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 72 +++++++++++++++++++++++++++++++++++---------------- fs/ext4/inode.c | 2 + fs/fuse/dax.c | 4 +-- fs/xfs/xfs_file.c | 2 + fs/xfs/xfs_inode.c | 4 +-- include/linux/dax.h | 11 +++++--- 6 files changed, 63 insertions(+), 32 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 76bad1c095c0..a75d4bf541b4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -74,11 +74,12 @@ fs_initcall(init_dax_wait_table); * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem * block allocation. */ -#define DAX_SHIFT (4) +#define DAX_SHIFT (5) #define DAX_LOCKED (1UL << 0) #define DAX_PMD (1UL << 1) #define DAX_ZERO_PAGE (1UL << 2) #define DAX_EMPTY (1UL << 3) +#define DAX_ZAP (1UL << 4) static unsigned long dax_to_pfn(void *entry) { @@ -95,6 +96,11 @@ static bool dax_is_locked(void *entry) return xa_to_value(entry) & DAX_LOCKED; } +static bool dax_is_zapped(void *entry) +{ + return xa_to_value(entry) & DAX_ZAP; +} + static unsigned int dax_entry_order(void *entry) { if (xa_to_value(entry) & DAX_PMD) @@ -407,19 +413,6 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, } } -static struct page *dax_busy_page(void *entry) -{ - unsigned long pfn; - - for_each_mapped_pfn(entry, pfn) { - struct page *page = pfn_to_page(pfn); - - if (!dax_page_idle(page)) - return page; - } - return NULL; -} - /* * dax_lock_page - Lock the DAX entry corresponding to a page * @page: The page whose entry we want to lock @@ -664,8 +657,43 @@ static void *grab_mapping_entry(struct xa_state *xas, return xa_mk_internal(VM_FAULT_FALLBACK); } +static void *dax_zap_entry(struct xa_state *xas, void *entry) +{ + unsigned long v = xa_to_value(entry); + + return xas_store(xas, xa_mk_value(v | DAX_ZAP)); +} + +/** + * Return NULL if the entry is zapped and all pages in the entry are + * idle, otherwise return the non-idle page in the entry + */ +static struct page *dax_zap_pages(struct xa_state *xas, void *entry) +{ + struct page *ret = NULL; + unsigned long pfn; + bool zap; + + if (!dax_entry_size(entry)) + return NULL; + + zap = !dax_is_zapped(entry); + + for_each_mapped_pfn(entry, pfn) { + struct page *page = pfn_to_page(pfn); + + if (!ret && !dax_page_idle(page)) + ret = page; + } + + if (zap) + dax_zap_entry(xas, entry); + + return ret; +} + /** - * dax_layout_busy_page_range - find first pinned page in @mapping + * dax_zap_mappings_range - find first pinned page in @mapping * @mapping: address space to scan for a page with ref count > 1 * @start: Starting offset. Page containing 'start' is included. * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX, @@ -682,8 +710,8 @@ static void *grab_mapping_entry(struct xa_state *xas, * to be able to run unmap_mapping_range() and subsequently not race * mapping_mapped() becoming true. */ -struct page *dax_layout_busy_page_range(struct address_space *mapping, - loff_t start, loff_t end) +struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, + loff_t end) { void *entry; unsigned int scanned = 0; @@ -727,7 +755,7 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping, if (unlikely(dax_is_locked(entry))) entry = get_unlocked_entry(&xas, 0); if (entry) - page = dax_busy_page(entry); + page = dax_zap_pages(&xas, entry); put_unlocked_entry(&xas, entry, WAKE_NEXT); if (page) break; @@ -742,13 +770,13 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping, xas_unlock_irq(&xas); return page; } -EXPORT_SYMBOL_GPL(dax_layout_busy_page_range); +EXPORT_SYMBOL_GPL(dax_zap_mappings_range); -struct page *dax_layout_busy_page(struct address_space *mapping) +struct page *dax_zap_mappings(struct address_space *mapping) { - return dax_layout_busy_page_range(mapping, 0, LLONG_MAX); + return dax_zap_mappings_range(mapping, 0, LLONG_MAX); } -EXPORT_SYMBOL_GPL(dax_layout_busy_page); +EXPORT_SYMBOL_GPL(dax_zap_mappings); static int __dax_invalidate_entry(struct address_space *mapping, pgoff_t index, bool trunc) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 478ec6bc0935..3935af49df8b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3957,7 +3957,7 @@ int ext4_break_layouts(struct inode *inode) return -EINVAL; do { - page = dax_layout_busy_page(inode->i_mapping); + page = dax_zap_mappings(inode->i_mapping); if (!page) return 0; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index ae52ef7dbabe..8cdc9402e8f7 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -443,7 +443,7 @@ static int fuse_setup_new_dax_mapping(struct inode *inode, loff_t pos, /* * Can't do inline reclaim in fault path. We call - * dax_layout_busy_page() before we free a range. And + * dax_zap_mappings() before we free a range. And * fuse_wait_dax_page() drops mapping->invalidate_lock and requires it. * In fault path we enter with mapping->invalidate_lock held and can't * drop it. Also in fault path we hold mapping->invalidate_lock shared @@ -671,7 +671,7 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, { struct page *page; - page = dax_layout_busy_page_range(inode->i_mapping, start, end); + page = dax_zap_mappings_range(inode->i_mapping, start, end); if (!page) return 0; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 556e28d06788..ca0afcdd98c0 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -822,7 +822,7 @@ xfs_break_dax_layouts( ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL)); - page = dax_layout_busy_page(inode->i_mapping); + page = dax_zap_mappings(inode->i_mapping); if (!page) return 0; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 28493c8e9bb2..d48dfee01008 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3481,8 +3481,8 @@ xfs_mmaplock_two_inodes_and_break_dax_layout( * need to unlock & lock the XFS_MMAPLOCK_EXCL which is not suitable * for this nested lock case. */ - page = dax_layout_busy_page(VFS_I(ip2)->i_mapping); - if (page && page_ref_count(page) != 1) { + page = dax_zap_mappings(VFS_I(ip2)->i_mapping); + if (page) { xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL); goto again; diff --git a/include/linux/dax.h b/include/linux/dax.h index 04987d14d7e0..f6acb4ed73cb 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -157,8 +157,9 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder) int dax_writeback_mapping_range(struct address_space *mapping, struct dax_device *dax_dev, struct writeback_control *wbc); -struct page *dax_layout_busy_page(struct address_space *mapping); -struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end); +struct page *dax_zap_mappings(struct address_space *mapping); +struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, + loff_t end); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, @@ -166,12 +167,14 @@ dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, void dax_unlock_mapping_entry(struct address_space *mapping, unsigned long index, dax_entry_t cookie); #else -static inline struct page *dax_layout_busy_page(struct address_space *mapping) +static inline struct page *dax_zap_mappings(struct address_space *mapping) { return NULL; } -static inline struct page *dax_layout_busy_page_range(struct address_space *mapping, pgoff_t start, pgoff_t nr_pages) +static inline struct page *dax_zap_mappings_range(struct address_space *mapping, + pgoff_t start, + pgoff_t nr_pages) { return NULL; } From patchwork Fri Oct 14 23:57:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB6E4C433FE for ; Fri, 14 Oct 2022 23:57:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BB806B007D; Fri, 14 Oct 2022 19:57:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56A766B007E; Fri, 14 Oct 2022 19:57:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40AF06B0080; Fri, 14 Oct 2022 19:57:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2CC356B007D for ; Fri, 14 Oct 2022 19:57:28 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0B780120B90 for ; Fri, 14 Oct 2022 23:57:28 +0000 (UTC) X-FDA: 80021219376.07.CB77650 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf28.hostedemail.com (Postfix) with ESMTP id 7BCCBC0025 for ; Fri, 14 Oct 2022 23:57:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791847; x=1697327847; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DyqxUkdpstu7tUvj01n9d1PqdUdnL/T7kQZf98+vfF8=; b=oAQNGnCXM0HRN6MwTYw8RiA5Yj1awJflZ3K8GeJxigBZdurMdqddGKh0 Nqe7t6CIr7PgGS6+1SRquxSP5x/BwFXCs4aFEMYmBvaCEezJBgEl0iLc9 pbf5cil8Bkpc9R+qO/sev83I9UvUdltGkj/FR95cTdlr0ztQQ4Y4h9YJn NerPYYrjySeKlQmYwfSkak6rKbnTtXaEOVeqb3M6we/lE+zJfX0g2szk7 97zIpI1Pq1g9Cs1dTI8CMJw3AvtB4X0nfqIhlz2OBL0aO/gLFIhJeJkha 4sjC6tzv7Yarrdrae81m1RlFxoCs0VCeYusBAq81gDmtiiDs8ppno1puf g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="304236575" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="304236575" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:26 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="658759571" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="658759571" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:25 -0700 Subject: [PATCH v3 05/25] fsdax: Wait for pinned pages during truncate_inode_pages_final() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , Dave Chinner , nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:25 -0700 Message-ID: <166579184544.2236710.791897642091142558.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F5ugqU034RxVk0oqrJzy0rgiU8LjdGeX1/GvT+kv+Ps=; b=qXhUR+S+64/QzJXBnvuNvR5TnwfWkrM5HPGhnqSJDhrgfzsL3QPv+/4yt34hURViD++4SY HeMsejXSf/sHoQJ17om/XLYmYd3Ons//aRza4ehmvSBldv1sngNM9CIPhipIQnQJiwVvNH a4Q4W0v6jDJXI/2Tn4hYBuYaNPd5SPs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=oAQNGnCX; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791847; a=rsa-sha256; cv=none; b=tZDsinFIcAO/kjYaE6PMZoMbm+UO98I/PNVtIhDNAPfNvhJVfcYkVZj2Gs+jf8nMCAKeh7 F3oSyLUU350yAx9RMZvNsdVU6r3Z5vtArAARNGLogUfOVQ08mdheh3W5bWNDwsLIGj9Hfy u7xxcnzUD3JMnerZ7zYOuvRQh2eZqck= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=oAQNGnCX; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: 75fkykuoye4b49nokzg5nsw95y4xnidq X-Rspamd-Queue-Id: 7BCCBC0025 X-HE-Tag: 1665791847-935706 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The fsdax truncate vs page pinning solution is incomplete. The initial solution landed in v4.17 and covered typical truncate invoked through truncate(2) and fallocate(2), i.e. the truncate_inode_pages() called on open files. However, that enabling left truncate_inode_pages_final(), called after iput_final() to free the inode, unprotected. Thankfully that v4.17 enabling also left a warning in place to fire if any truncate is attempted while a DAX page is still pinned: commit d2c997c0f145 ("fs, dax: use page->mapping to warn if truncate collides with a busy page") While a lore search indicates no reports of that firing, the hole is there nonetheless. The concern is that if/when that warning fires it indicates a use-after-free condition whereby the filesystem has lost the ability to arbitrate access to its storage blocks. For example, in the worst case, DMA may be ongoing while the filesystem thinks the block is free to be reallocated to another inode. This patch is based on an observation from Dave that during iput_final() there is no need to hold filesystem locks like the explicit truncate path. The wait can occur from within dax_delete_mapping_entry() called by truncate_folio_batch_exceptionals(). This solution trades off fixing the use-after-free with a theoretical deadlock scenario. If the agent holding the page pin triggers inode reclaim and that reclaim waits for the pin to drop it will deadlock. Two observations make this approach still worth pursuing: 1/ Any existing scenarios where that happens would have triggered the warning referenced above which has shipped upstream for ~5 years without a bug report on lore. 2/ Most I/O drivers only hold page pins in their fast paths and new __GFP_FS allocations are unlikely in a driver fast path. I.e. if the deadlock triggers the likely fix would be in the offending driver, not new band-aids in fsdax. So, update the DAX core to notice that the inode->i_mapping is in the exiting state and use that as a signal that the inode is unreferenced await page-pins to drain. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Reported-by: Dave Chinner Signed-off-by: Dan Williams --- fs/dax.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/dax.c b/fs/dax.c index a75d4bf541b4..e3deb60a792f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -803,13 +803,37 @@ static int __dax_invalidate_entry(struct address_space *mapping, return ret; } +/* + * wait indefinitely for all pins to drop, the alternative to waiting is + * a potential use-after-free scenario + */ +static void dax_break_layout(struct address_space *mapping, pgoff_t index) +{ + /* To do this without locks, the inode needs to be unreferenced */ + WARN_ON(atomic_read(&mapping->host->i_count)); + do { + struct page *page; + + page = dax_zap_mappings_range(mapping, index << PAGE_SHIFT, + (index + 1) << PAGE_SHIFT); + if (!page) + return; + wait_var_event(page, dax_page_idle(page)); + } while (true); +} + /* * Delete DAX entry at @index from @mapping. Wait for it * to be unlocked before deleting it. */ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index) { - int ret = __dax_invalidate_entry(mapping, index, true); + int ret; + + if (mapping_exiting(mapping)) + dax_break_layout(mapping, index); + + ret = __dax_invalidate_entry(mapping, index, true); /* * This gets called from truncate / punch_hole path. As such, the caller From patchwork Fri Oct 14 23:57:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C2ADC433FE for ; Fri, 14 Oct 2022 23:57:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3A796B007E; Fri, 14 Oct 2022 19:57:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE8276B0080; Fri, 14 Oct 2022 19:57:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B0D76B0081; Fri, 14 Oct 2022 19:57:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 89BA36B007E for ; Fri, 14 Oct 2022 19:57:34 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 63E5980A02 for ; Fri, 14 Oct 2022 23:57:34 +0000 (UTC) X-FDA: 80021219628.19.64A0EBA Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf22.hostedemail.com (Postfix) with ESMTP id B2ECAC0028 for ; Fri, 14 Oct 2022 23:57:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791853; x=1697327853; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WDhrLpyapWWVzBNALsAavqF2Cp5tBjhlbxEJIcuaLkE=; b=Cm9t/cMo73W42SS1YSVvWK5gw81hGqwWlQnObD8atZzRneo5ZIjY/5A/ HhkJJnUFCISK+qqzeMsXc6e8teiXsgWw+hTztnPIA5wBUqnH6jjFzyE6D DnEOuvXlRTC1x1XIdaAk69cmo8N4pgqj2puJcRTmsqXvccxLTdRTWm1aL 0RFMETqJ359t3nST+2oOBtSIYxNxeaBRLgBO1xMaa7e0c1iJzYlet6mdg 5BWxYERn6JEUqf95Wpdsl2SyAcvgePV997tKOZac3jBOYy3yY1o1ug9y4 LwmLVUK2nITe/X67n3Xn0D7x0/igrsoDLvBfSINLq9cpmBQp0xTrWbt7E g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="367523084" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="367523084" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:32 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113254" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113254" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:31 -0700 Subject: [PATCH v3 06/25] fsdax: Validate DAX layouts broken before truncate From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , Dave Chinner , nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:31 -0700 Message-ID: <166579185112.2236710.17571345510304035858.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="Cm9t/cMo"; spf=pass (imf22.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791853; a=rsa-sha256; cv=none; b=yCFHFpSkRBRK4RUr9/cFzrFI2MVI5CoXVAHaBfeJm3nM29G8ZLko21PUHSeAPXJy8JVpRo pD2/Xme2j6DiTgDJ8IyXjeoCtwvqPBOIZ1Gi2mfgnU+1mRNifCVVrJ+MvzCxMsD3hDmzf4 eMKcfEnIivo2L0Nn4gCZps1ExzYeHOY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OI4yd8tcXdPjF9n/Dll0IIgldncIeYstMvGae4rxZBE=; b=NeB1Jd6q/rvo4HFYZSpThiHpRZaTFU7azQwsqbFm+fyYE+eOE9NbK5LPAxsS+RhlEIUYZH PO2+vkmsBu+UmSQ3RJ27vwPT7QR/MSFouA2rffjIVB1hZP9vdCjkQQmWYG9NCorUVbF77/ qY9yP/eoUARx4Mu9aRjPuwfaO3HLmnQ= X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="Cm9t/cMo"; spf=pass (imf22.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B2ECAC0028 X-Stat-Signature: 8se8e7s4xuowmq5a9itq37j5z46fmxat X-HE-Tag: 1665791853-739643 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that iput_final() arranges for dax_break_layouts(), all truncate_inode_pages() paths in the kernel ensure that no DAX pages hosted by that inode are in use. Add warnings to assert the new entry state transitions. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Cc: Dave Chinner Signed-off-by: Dan Williams --- fs/dax.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index e3deb60a792f..1d4f0072e58d 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -401,13 +401,15 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(trunc && !dax_page_idle(page)); if (dax_mapping_is_cow(page->mapping)) { /* keep the CoW flag if this page is still shared */ if (page->index-- > 0) continue; - } else + } else { + WARN_ON_ONCE(trunc && !dax_is_zapped(entry)); + WARN_ON_ONCE(trunc && !dax_page_idle(page)); WARN_ON_ONCE(page->mapping && page->mapping != mapping); + } page->mapping = NULL; page->index = 0; } From patchwork Fri Oct 14 23:57:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30AF7C4332F for ; Fri, 14 Oct 2022 23:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B86D26B0080; Fri, 14 Oct 2022 19:57:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B369A6B0081; Fri, 14 Oct 2022 19:57:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FEAC6B0082; Fri, 14 Oct 2022 19:57:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8D9FA6B0080 for ; Fri, 14 Oct 2022 19:57:40 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6495B803B9 for ; Fri, 14 Oct 2022 23:57:40 +0000 (UTC) X-FDA: 80021219880.22.A340D4A Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf02.hostedemail.com (Postfix) with ESMTP id C470C8002D for ; Fri, 14 Oct 2022 23:57:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791859; x=1697327859; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yd85VDxCtKyxeZdpNqoca3KmUlvyPqntXffUSxCk3NA=; b=IiE/RKn2TvEvzZmubF4U0Xcy3p6qVCAKd7hk1/Wej+Pl8u8D11NiuFPw byNzIeGbNY1cTqsHKw3ynMp0sOHpZ1fHCJGLP6si0vgC103C20Ph1COdN nJ9l93ew9uAaQWf+uRumCDliJfvBochMwN5aadsG9Se6lwwJ5Xzw9ba0q jdLPkWEwabkFTw25bIvZboRYaqSrvp1qUFp+/5icCcteq2trJPXeDLjCB nFD7AHyLnNE5bNBu2J0403r6OCU+x6oFpsa7TX3wtVwda48WdQHo4Zejp GXE6cJMNoO+VsHM64Rrl83vEaPSwA5LfF2nJBQYoxBQjA70a0UT1uxKNe g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="307154575" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="307154575" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:38 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113270" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113270" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:37 -0700 Subject: [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:37 -0700 Message-ID: <166579185727.2236710.8711235794537270051.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791860; a=rsa-sha256; cv=none; b=ikCRy7dyyHQ6UAHxJhzEk5tkI0f+btQr/8dM0mIqtKyuZ0YqTjcK8CksZGJ38xq6G00zoS vTV+lvWpOtU8pWFuqzHa2Jnn46W9FbMVa24L5JpRm8M6v0QKct60Y32YddxdwJW6wPS1j0 H67zFQINH5DzJbbZWRh/Ais3OHJRAF0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="IiE/RKn2"; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qzlqEJ1ug+gcINugjdFLGS/BbcQqCegyPErlCHAUL18=; b=2pYBY5rFHnyp6adnM9Kx4i0iQc4vobgd7zjYHYAVp1p5fnhpFsT7ABPcfweV8KEguoxe7c eRXf476fGk77XRNJsTiRf1ewZ5DTL7Atn4Es/aCI0rlNniSDXzdWcW1ZdkHWBIO99ZI/EU 6sklu9gTK7X2hVVL6+HPgMqsK1Oy9BA= X-Stat-Signature: t3i6hmganhew53nxgbzqb9bowk6nkmm9 X-Rspamd-Queue-Id: C470C8002D X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="IiE/RKn2"; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam06 X-HE-Tag: 1665791859-544652 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for dax_insert_entry() to start taking page and pgmap references ensure that page->pgmap is valid by holding the dax_read_lock() over both dax_direct_access() and dax_insert_entry(). I.e. the code that wants to elevate the reference count of a pgmap page from 0 -> 1 must ensure that the pgmap is not exiting and will not start exiting until the proper references have been taken. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 1d4f0072e58d..6990a6e7df9f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1107,10 +1107,9 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos, size_t size, void **kaddr, pfn_t *pfnp) { pgoff_t pgoff = dax_iomap_pgoff(iomap, pos); - int id, rc = 0; long length; + int rc = 0; - id = dax_read_lock(); length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), DAX_ACCESS, kaddr, pfnp); if (length < 0) { @@ -1135,7 +1134,6 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos, if (!*kaddr) rc = -EFAULT; out: - dax_read_unlock(id); return rc; } @@ -1588,7 +1586,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; bool write = iter->flags & IOMAP_WRITE; unsigned long entry_flags = pmd ? DAX_PMD : 0; - int err = 0; + int err = 0, id; pfn_t pfn; void *kaddr; @@ -1608,11 +1606,15 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS; } + id = dax_read_lock(); err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn); - if (err) + if (err) { + dax_read_unlock(id); return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); + } *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); + dax_read_unlock(id); if (write && srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { From patchwork Fri Oct 14 23:57:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5193BC4332F for ; Fri, 14 Oct 2022 23:57:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D347B6B0081; Fri, 14 Oct 2022 19:57:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE5416B0082; Fri, 14 Oct 2022 19:57:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAC526B0083; Fri, 14 Oct 2022 19:57:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A503F6B0081 for ; Fri, 14 Oct 2022 19:57:50 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7EA9AC0E13 for ; Fri, 14 Oct 2022 23:57:50 +0000 (UTC) X-FDA: 80021220300.22.0B6463E Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf23.hostedemail.com (Postfix) with ESMTP id 07C3E140029 for ; Fri, 14 Oct 2022 23:57:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791870; x=1697327870; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dRBEo+va9LdTaMALBHzem55il8qhSDS6I2rVZ6WlqEM=; b=JsP6kXz7wDxS1G7veJDpAYeoY+yLw/Yw7fz1FeuC1EbNwZd7hJy67ZqQ EIlw0E3bHGJFE/YEd6R3i85j6erj4DN9J4OUOfiWxfQlOwc6yz1NlfrBc 8lB4DY49v/of1rK4qkDqywvFeM1rsT5eyZqECkg6dlO5bc/gko3M1pCUU 3Puf/OPOfNX/lw2S1zzHCA5PZTsCpN7y1BUEw/sJ63dqArr5XuZsi1sJF WgZhJ8CWqksB2b3M3JM+T+s+yXKRCi4XQFYU73uNMuj6pkJJwhDakZV3J 2dMxhX/JNlDpSU4sPpMl5aQPX25pH1sRIipUYK0Dk8c8PtV4iduAd9Hwj A==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292861985" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292861985" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:48 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113300" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113300" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:43 -0700 Subject: [PATCH v3 08/25] fsdax: Update dax_insert_entry() calling convention to return an error From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:43 -0700 Message-ID: <166579186334.2236710.388332274317019999.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=JsP6kXz7; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791870; a=rsa-sha256; cv=none; b=xZe92tGyP+I9GZ21RZr8pBpG/Tv4fMpipTXWeLVckyJ73QworpfpPz/beqGzbwiZ5I3nZd nTELnmRqswfw9uN8di4AujELWnS5mzxoOye32282Xl/EErtVro1/z9H7AQpJKXcrBAJ/m2 i5Npt5GtH8/zW1/BqK1S9CK+8BhE5fI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FqTkZYoXrsXbDq6UxOR5urGX8nFTPi0wgrVHtoiUmsM=; b=h78tV/btAbkYe/p+QTiNVoPWXadZBgN6PyAPZLB7UohDlEdqWFBESuQ71EBIhB9+VEBNnP H1aQvuFf8rHHRJz3DzvhWUE41lmS4qtLpscsJiUGzQDuhm4MvvOcn1aTsRSnMLQ+zGdpVY JubEbjsuCbIEaHVbFyoJikKnYgaTs5o= X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=JsP6kXz7; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 07C3E140029 X-Stat-Signature: 8n1xdyy83u7n5r63x46qk96bka34hojy X-HE-Tag: 1665791869-194770 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for teaching dax_insert_entry() to take live @pgmap references, enable it to return errors. Given the observation that all callers overwrite the passed in entry with the return value, just update @entry in place and convert the return code to a vm_fault_t status. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 6990a6e7df9f..1f6c1abfe0c9 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -907,14 +907,15 @@ static bool dax_fault_is_cow(const struct iomap_iter *iter) * already in the tree, we will skip the insertion and just dirty the PMD as * appropriate. */ -static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, - const struct iomap_iter *iter, void *entry, pfn_t pfn, - unsigned long flags) +static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, + const struct iomap_iter *iter, void **pentry, + pfn_t pfn, unsigned long flags) { struct address_space *mapping = vmf->vma->vm_file->f_mapping; void *new_entry = dax_make_entry(pfn, flags); bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); bool cow = dax_fault_is_cow(iter); + void *entry = *pentry; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); @@ -960,7 +961,9 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); xas_unlock_irq(xas); - return entry; + + *pentry = entry; + return 0; } static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, @@ -1206,9 +1209,12 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, struct vm_fault *vmf, pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr)); vm_fault_t ret; - *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, DAX_ZERO_PAGE); + ret = dax_insert_entry(xas, vmf, iter, entry, pfn, DAX_ZERO_PAGE); + if (ret) + goto out; ret = vmf_insert_mixed(vmf->vma, vaddr, pfn); +out: trace_dax_load_hole(inode, vmf, ret); return ret; } @@ -1225,6 +1231,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, struct page *zero_page; spinlock_t *ptl; pmd_t pmd_entry; + vm_fault_t ret; pfn_t pfn; zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm); @@ -1233,8 +1240,10 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, goto fallback; pfn = page_to_pfn_t(zero_page); - *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, - DAX_PMD | DAX_ZERO_PAGE); + ret = dax_insert_entry(xas, vmf, iter, entry, pfn, + DAX_PMD | DAX_ZERO_PAGE); + if (ret) + return ret; if (arch_needs_pgtable_deposit()) { pgtable = pte_alloc_one(vma->vm_mm); @@ -1587,6 +1596,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, bool write = iter->flags & IOMAP_WRITE; unsigned long entry_flags = pmd ? DAX_PMD : 0; int err = 0, id; + vm_fault_t ret; pfn_t pfn; void *kaddr; @@ -1613,8 +1623,10 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); } - *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); + ret = dax_insert_entry(xas, vmf, iter, entry, pfn, entry_flags); dax_read_unlock(id); + if (ret) + return ret; if (write && srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { From patchwork Fri Oct 14 23:57:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 534A3C433FE for ; Fri, 14 Oct 2022 23:57:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1E166B0082; Fri, 14 Oct 2022 19:57:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF6056B0083; Fri, 14 Oct 2022 19:57:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBDF46B0085; Fri, 14 Oct 2022 19:57:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B8E316B0082 for ; Fri, 14 Oct 2022 19:57:52 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9C0EAC0EE9 for ; Fri, 14 Oct 2022 23:57:52 +0000 (UTC) X-FDA: 80021220384.07.F9A5781 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf23.hostedemail.com (Postfix) with ESMTP id 09238140031 for ; Fri, 14 Oct 2022 23:57:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791872; x=1697327872; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DzcvRRhSc/UKZtDBh8japCd9LbnnTSOaNgXJjR8jGE4=; b=FB6rCTAw+ON1/xs37AbCJeTp2E8AO6YToNRP8ZkcjrbURsOvlg5K/ubZ sABDdvlc5CCCrLuOVuI83ZznO6pZMKxkeCK4yaYx5oM0PDG9LLixZ5AkL rph8T+7cZvqnAFQNLlyqDNSTZ8X3TwYgdYC+/o/blKe/Xue5i6kV+HZKn zXIXUGyfB0jObKt8gdifOcbOB6MU4baJtZu+NNXHMYl+Jn9Lp4bUoY+29 VJnGQyyM80lCqTctr4j0I2H6k7BPIa3USmPJRZaM2E/R3LztodZgdChOz dQjWUlatxklfqJqf2COvroE1gYc9w22YPVK9vFIJWWqsRnhV/2g4NEwmI w==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292861995" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292861995" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:50 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113310" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113310" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:49 -0700 Subject: [PATCH v3 09/25] fsdax: Rework for_each_mapped_pfn() to dax_for_each_folio() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:49 -0700 Message-ID: <166579186941.2236710.1345776454315696392.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=FB6rCTAw; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791872; a=rsa-sha256; cv=none; b=AmyOp53RsfxmX1sLK37rQ00VmjKIMtQzUEEzFm70HYe+V4x9BT1ed3ktJeeCfMWB9Sfi4E b+nIV1uohrN8Xc5wQ+iX5TM+Zfjd3nbOyfJE4a5qTwn93h5Ac6HTUU0AaIF0xHkJoMhl96 Q801oNDejzBbHR4xSrk7foSrjZpIlTg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791872; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F4T/xwis5KtL1hoNjpRFXiDFxUN5+Nf/sCNDQ02JfWo=; b=MF5XK3MTpykwzoHwkJEoeAXXR8RPOkd0ohbD3Un7r6iI+fBG9LPsSiknEIgghn041XJ+h0 iMiLMv7QclzBnYECOb2b+M3n8Sb2kXhJPhtxBzrByHpKBlh5CCptGzi3W17ZbPKtn2EruX Aw/Z5RssM/VWrTIC+8LUl7Q5kxTo/qo= X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=FB6rCTAw; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 09238140031 X-Stat-Signature: tru6o9aump7w1kwz115arh3gawxi6m9o X-HE-Tag: 1665791871-293277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for requesting folios from a pgmap, rework for_each_mapped_pfn() to operate in terms of folios. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- .clang-format | 1 + fs/dax.c | 102 ++++++++++++++++++++++++++++++--------------------- include/linux/dax.h | 5 +++ 3 files changed, 66 insertions(+), 42 deletions(-) diff --git a/.clang-format b/.clang-format index 1247d54f9e49..767651ddc50c 100644 --- a/.clang-format +++ b/.clang-format @@ -136,6 +136,7 @@ ForEachMacros: - 'data__for_each_file' - 'data__for_each_file_new' - 'data__for_each_file_start' + - 'dax_for_each_folio' - 'device_for_each_child_node' - 'displayid_iter_for_each' - 'dma_fence_array_for_each' diff --git a/fs/dax.c b/fs/dax.c index 1f6c1abfe0c9..d03c7a952d02 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -327,18 +327,41 @@ static unsigned long dax_entry_size(void *entry) return PAGE_SIZE; } -static unsigned long dax_end_pfn(void *entry) +/* + * Until fsdax constructs compound folios it needs to be prepared to + * support multiple folios per entry where each folio is a single page + */ +static struct folio *dax_entry_to_folio(void *entry, int idx) { - return dax_to_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE; + unsigned long pfn, size = dax_entry_size(entry); + struct page *page; + struct folio *folio; + + if (!size) + return NULL; + + pfn = dax_to_pfn(entry); + page = pfn_to_page(pfn); + folio = page_folio(page); + + /* + * Are there multiple folios per entry, and has the iterator + * passed the end of that set? + */ + if (idx >= size / folio_size(folio)) + return NULL; + + VM_WARN_ON_ONCE(!IS_ALIGNED(size, folio_size(folio))); + + return page_folio(page + idx); } /* - * Iterate through all mapped pfns represented by an entry, i.e. skip - * 'empty' and 'zero' entries. + * Iterate through all folios associated with a given entry */ -#define for_each_mapped_pfn(entry, pfn) \ - for (pfn = dax_to_pfn(entry); \ - pfn < dax_end_pfn(entry); pfn++) +#define dax_for_each_folio(entry, folio, i) \ + for (i = 0, folio = dax_entry_to_folio(entry, i); folio; \ + folio = dax_entry_to_folio(entry, ++i)) static inline bool dax_mapping_is_cow(struct address_space *mapping) { @@ -348,18 +371,18 @@ static inline bool dax_mapping_is_cow(struct address_space *mapping) /* * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount. */ -static inline void dax_mapping_set_cow(struct page *page) +static inline void dax_mapping_set_cow(struct folio *folio) { - if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) { + if ((uintptr_t)folio->mapping != PAGE_MAPPING_DAX_COW) { /* - * Reset the index if the page was already mapped + * Reset the index if the folio was already mapped * regularly before. */ - if (page->mapping) - page->index = 1; - page->mapping = (void *)PAGE_MAPPING_DAX_COW; + if (folio->mapping) + folio->index = 1; + folio->mapping = (void *)PAGE_MAPPING_DAX_COW; } - page->index++; + folio->index++; } /* @@ -370,48 +393,45 @@ static inline void dax_mapping_set_cow(struct page *page) static void dax_associate_entry(void *entry, struct address_space *mapping, struct vm_area_struct *vma, unsigned long address, bool cow) { - unsigned long size = dax_entry_size(entry), pfn, index; - int i = 0; + unsigned long size = dax_entry_size(entry), index; + struct folio *folio; + int i; if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return; index = linear_page_index(vma, address & ~(size - 1)); - for_each_mapped_pfn(entry, pfn) { - struct page *page = pfn_to_page(pfn); - + dax_for_each_folio(entry, folio, i) if (cow) { - dax_mapping_set_cow(page); + dax_mapping_set_cow(folio); } else { - WARN_ON_ONCE(page->mapping); - page->mapping = mapping; - page->index = index + i++; + WARN_ON_ONCE(folio->mapping); + folio->mapping = mapping; + folio->index = index + i; } - } } static void dax_disassociate_entry(void *entry, struct address_space *mapping, bool trunc) { - unsigned long pfn; + struct folio *folio; + int i; if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return; - for_each_mapped_pfn(entry, pfn) { - struct page *page = pfn_to_page(pfn); - - if (dax_mapping_is_cow(page->mapping)) { - /* keep the CoW flag if this page is still shared */ - if (page->index-- > 0) + dax_for_each_folio(entry, folio, i) { + if (dax_mapping_is_cow(folio->mapping)) { + /* keep the CoW flag if this folio is still shared */ + if (folio->index-- > 0) continue; } else { WARN_ON_ONCE(trunc && !dax_is_zapped(entry)); - WARN_ON_ONCE(trunc && !dax_page_idle(page)); - WARN_ON_ONCE(page->mapping && page->mapping != mapping); + WARN_ON_ONCE(trunc && !dax_folio_idle(folio)); + WARN_ON_ONCE(folio->mapping && folio->mapping != mapping); } - page->mapping = NULL; - page->index = 0; + folio->mapping = NULL; + folio->index = 0; } } @@ -673,20 +693,18 @@ static void *dax_zap_entry(struct xa_state *xas, void *entry) static struct page *dax_zap_pages(struct xa_state *xas, void *entry) { struct page *ret = NULL; - unsigned long pfn; + struct folio *folio; bool zap; + int i; if (!dax_entry_size(entry)) return NULL; zap = !dax_is_zapped(entry); - for_each_mapped_pfn(entry, pfn) { - struct page *page = pfn_to_page(pfn); - - if (!ret && !dax_page_idle(page)) - ret = page; - } + dax_for_each_folio(entry, folio, i) + if (!ret && !dax_folio_idle(folio)) + ret = folio_page(folio, 0); if (zap) dax_zap_entry(xas, entry); diff --git a/include/linux/dax.h b/include/linux/dax.h index f6acb4ed73cb..12e15ca11bff 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -222,6 +222,11 @@ static inline bool dax_page_idle(struct page *page) return page_ref_count(page) == 1; } +static inline bool dax_folio_idle(struct folio *folio) +{ + return dax_page_idle(folio_page(folio, 0)); +} + #if IS_ENABLED(CONFIG_DAX) int dax_read_lock(void); void dax_read_unlock(int id); From patchwork Fri Oct 14 23:57:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70569C4332F for ; Fri, 14 Oct 2022 23:57:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 089C56B0083; Fri, 14 Oct 2022 19:57:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 039F96B0085; Fri, 14 Oct 2022 19:57:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1D5E8E0001; Fri, 14 Oct 2022 19:57:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CF7EB6B0083 for ; Fri, 14 Oct 2022 19:57:58 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A334780A02 for ; Fri, 14 Oct 2022 23:57:58 +0000 (UTC) X-FDA: 80021220636.02.082EF65 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf21.hostedemail.com (Postfix) with ESMTP id EC2931C0020 for ; Fri, 14 Oct 2022 23:57:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791878; x=1697327878; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2k3kBI4gAM8OQoDfEoj4je6Hcxxjiy4NKP4rIBon+AA=; b=EBQ0isQNBP3H0ULBqFxcGB5jbJU+1ZPuq+JJYqU9mLjudNCn4RGAY8c3 Y0ksI2IEhAcFb+lAzvsiWNpnhqhvMCUoaEYoJPehCPrgw1khYTW7Rtk7e IaQRyV3jwy97fDUpJ+hY9fxGuXBEOg2SpNBQGEaIjDaBynKaCpfGXVPEZ A/JNnnUtwBxPAo7rnMGM5EtjpnHwH3itDvegunBi21ZN4Ey5vDG9d+fpF PQpRp1iqkgAJ44XO4LTKBvsAMnYO0xj1RhJaB88SM/zNIm7WfXNAVw8Nj d64E4T9Rql+Tvax4tAY61a+GxW6uqlI29rV5uzZVxRRXAisjcZaWzNiVl A==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="304236662" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="304236662" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:56 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113325" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113325" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:57:56 -0700 Subject: [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Alistair Popple , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:57:55 -0700 Message-ID: <166579187573.2236710.10151157417629496558.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=EBQ0isQN; spf=pass (imf21.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791878; a=rsa-sha256; cv=none; b=wOZ8N3ultcboVzeuwE9psS+mJsLpuy3VkuYPpa2efWHg5a+5DvaePJGRk4Of76q4PhsYNZ 3+Kcby2plohr7NeKSbc0/kDim6rhUDaAbj33bcBEMl7ZTJVJxSrnh1yTSS4Q3W8gdc6ldi rkrX1tlmHNh36WAWVbcNYm2iHdVDMqo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791878; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LlbnPOH2liVqDFfmWCtb46mP1+fobWb5Ecsw0Oa6Y7Y=; b=BaY1n/HYaSKJSVm4Ku7PNRqWB2msbcnzC9Hr45RgzCHpAc7Icx3LtyO3VscKhwkinbmRN2 dmc4DJxo6tukSq8cg6Evf6DZuaj/IzP5StndQw8tIikuHmXLL1W/aRtzgulCy8g1PuDVht qNAXFgpiTJCby8C27Y58YyCuCFbngN0= X-Rspamd-Server: rspam05 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=EBQ0isQN; spf=pass (imf21.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: uo1ueba6hffz9o8qhpdrbcsnwbf46zjz X-Rspamd-Queue-Id: EC2931C0020 X-HE-Tag: 1665791877-762793 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The next step in sanitizing DAX page and pgmap lifetime is to take page references when a pgmap user maps a page or otherwise puts it into use. Unlike the page allocator where the it picks the page/folio, ZONE_DEVICE users know in advance which folio they want to access. Additionally, ZONE_DEVICE implementations know when the pgmap is alive. Introduce pgmap_request_folios() that pins @nr_folios folios at a time provided they are contiguous and of the same folio_order(). Some WARN assertions are added to document expectations and catch bugs in future kernel work, like a potential conversion of fsdax to use multi-page folios, but they otherwise are not expected to fire. Note that the paired pgmap_release_folios() implementation temporarily, in this path, takes an @pgmap argument to drop pgmap references. A follow-on patch arranges for free_zone_device_page() to drop pgmap references in all cases. In other words, the intent is that only put_folio() (on each folio requested pgmap_request_folio()) is needed to to undo pgmap_request_folios(). The intent is that this also replaces zone_device_page_init(), but that too requires some more preparatory reworks to unify the various MEMORY_DEVICE_* types. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Cc: Alistair Popple Suggested-by: Jason Gunthorpe Signed-off-by: Dan Williams --- fs/dax.c | 32 ++++++++++++++++----- include/linux/memremap.h | 17 +++++++++++ mm/memremap.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 8 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index d03c7a952d02..095c9d7b4a1d 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -385,20 +385,27 @@ static inline void dax_mapping_set_cow(struct folio *folio) folio->index++; } +static struct dev_pagemap *folio_pgmap(struct folio *folio) +{ + return folio_page(folio, 0)->pgmap; +} + /* * When it is called in dax_insert_entry(), the cow flag will indicate that * whether this entry is shared by multiple files. If so, set the page->mapping * FS_DAX_MAPPING_COW, and use page->index as refcount. */ -static void dax_associate_entry(void *entry, struct address_space *mapping, - struct vm_area_struct *vma, unsigned long address, bool cow) +static vm_fault_t dax_associate_entry(void *entry, + struct address_space *mapping, + struct vm_area_struct *vma, + unsigned long address, bool cow) { unsigned long size = dax_entry_size(entry), index; struct folio *folio; int i; if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) - return; + return 0; index = linear_page_index(vma, address & ~(size - 1)); dax_for_each_folio(entry, folio, i) @@ -406,9 +413,13 @@ static void dax_associate_entry(void *entry, struct address_space *mapping, dax_mapping_set_cow(folio); } else { WARN_ON_ONCE(folio->mapping); + if (!pgmap_request_folios(folio_pgmap(folio), folio, 1)) + return VM_FAULT_SIGBUS; folio->mapping = mapping; folio->index = index + i; } + + return 0; } static void dax_disassociate_entry(void *entry, struct address_space *mapping, @@ -702,9 +713,12 @@ static struct page *dax_zap_pages(struct xa_state *xas, void *entry) zap = !dax_is_zapped(entry); - dax_for_each_folio(entry, folio, i) + dax_for_each_folio(entry, folio, i) { + if (zap) + pgmap_release_folios(folio_pgmap(folio), folio, 1); if (!ret && !dax_folio_idle(folio)) ret = folio_page(folio, 0); + } if (zap) dax_zap_entry(xas, entry); @@ -934,6 +948,7 @@ static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); bool cow = dax_fault_is_cow(iter); void *entry = *pentry; + vm_fault_t ret = 0; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); @@ -954,8 +969,10 @@ static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, void *old; dax_disassociate_entry(entry, mapping, false); - dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, + ret = dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, cow); + if (ret) + goto out; /* * Only swap our new entry into the page cache if the current * entry is a zero page or an empty entry. If a normal PTE or @@ -978,10 +995,11 @@ static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, if (cow) xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); + *pentry = entry; +out: xas_unlock_irq(xas); - *pentry = entry; - return 0; + return ret; } static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 7fcaf3180a5b..b87c16577af1 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -193,7 +193,11 @@ void memunmap_pages(struct dev_pagemap *pgmap); void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap); struct dev_pagemap *get_dev_pagemap(unsigned long pfn, - struct dev_pagemap *pgmap); + struct dev_pagemap *pgmap); +bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, + int nr_folios); +void pgmap_release_folios(struct dev_pagemap *pgmap, struct folio *folio, + int nr_folios); bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); @@ -223,6 +227,17 @@ static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, return NULL; } +static inline bool pgmap_request_folios(struct dev_pagemap *pgmap, + struct folio *folio, int nr_folios) +{ + return false; +} + +static inline void pgmap_release_folios(struct dev_pagemap *pgmap, + struct folio *folio, int nr_folios) +{ +} + static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn) { return false; diff --git a/mm/memremap.c b/mm/memremap.c index f9287babb3ce..87a649ecdc54 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -530,6 +530,76 @@ void zone_device_page_init(struct page *page) } EXPORT_SYMBOL_GPL(zone_device_page_init); +static bool folio_span_valid(struct dev_pagemap *pgmap, struct folio *folio, + int nr_folios) +{ + unsigned long pfn_start, pfn_end; + + pfn_start = page_to_pfn(folio_page(folio, 0)); + pfn_end = pfn_start + (1 << folio_order(folio)) * nr_folios - 1; + + if (pgmap != xa_load(&pgmap_array, pfn_start)) + return false; + + if (pfn_end > pfn_start && pgmap != xa_load(&pgmap_array, pfn_end)) + return false; + + return true; +} + +/** + * pgmap_request_folios - activate an contiguous span of folios in @pgmap + * @pgmap: host page map for the folio array + * @folio: start of the folio list, all subsequent folios have same folio_size() + * + * Caller is responsible for @pgmap remaining live for the duration of + * this call. Caller is also responsible for not racing requests for the + * same folios. + */ +bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, + int nr_folios) +{ + struct folio *iter; + int i; + + /* + * All of the WARNs below are for catching bugs in future + * development that changes the assumptions of: + * 1/ uniform folios in @pgmap + * 2/ @pgmap death does not race this routine. + */ + VM_WARN_ON_ONCE(!folio_span_valid(pgmap, folio, nr_folios)); + + if (WARN_ON_ONCE(percpu_ref_is_dying(&pgmap->ref))) + return false; + + for (iter = folio_next(folio), i = 1; i < nr_folios; + iter = folio_next(folio), i++) + if (WARN_ON_ONCE(folio_order(iter) != folio_order(folio))) + return false; + + for (iter = folio, i = 0; i < nr_folios; iter = folio_next(iter), i++) { + folio_ref_inc(iter); + if (folio_ref_count(iter) == 1) + percpu_ref_tryget(&pgmap->ref); + } + + return true; +} + +void pgmap_release_folios(struct dev_pagemap *pgmap, struct folio *folio, int nr_folios) +{ + struct folio *iter; + int i; + + for (iter = folio, i = 0; i < nr_folios; iter = folio_next(iter), i++) { + if (!put_devmap_managed_page(&iter->page)) + folio_put(iter); + if (!folio_ref_count(iter)) + put_dev_pagemap(pgmap); + } +} + #ifdef CONFIG_FS_DAX bool __put_devmap_managed_page_refs(struct page *page, int refs) { From patchwork Fri Oct 14 23:58:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF5E8C4332F for ; Fri, 14 Oct 2022 23:58:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 608086B0085; Fri, 14 Oct 2022 19:58:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B7F86B0087; Fri, 14 Oct 2022 19:58:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 459116B0088; Fri, 14 Oct 2022 19:58:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 344AF6B0085 for ; Fri, 14 Oct 2022 19:58:06 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0770080CCB for ; Fri, 14 Oct 2022 23:58:06 +0000 (UTC) X-FDA: 80021220972.19.916352B Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf16.hostedemail.com (Postfix) with ESMTP id 7312A180010 for ; Fri, 14 Oct 2022 23:58:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791885; x=1697327885; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MBvS7Cxj+AIc8d1loS44lpMvVCT4iCj1AXq9hKJEVYM=; b=GDXAuEcgVWEos+pjFC6OPVk46QNhJrYmr84HlYDCSpdLqme93uaTnnCX uzAS2IT6Hl+XJI7HrPqnJpmI0VrHmaB+EDquEJZPV1CZJqShSEcgf3qIw Hm0TvlKqSVbJUqmo0oeHF2mr75Pa2AQtFscIrabOghOhG3epVmK5jkHB7 eKIGGO2UrjW0MUqfhifDO/gM3C0gq7q2b6UYDv19KsjdgmUt+bdcPozoV +Rotv9ZryV5kO2pXevNjJr8HfpjVoOOGbLnP7/ZGgNJY6IKFBPZTXBGGN CGY63VQH2fi18ibE0DNTIcfvyogXvK2WgM/5Ys8+1NFcCXk6+CjqsCrdo g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292862033" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292862033" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:04 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113365" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113365" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:02 -0700 Subject: [PATCH v3 11/25] fsdax: Rework dax_insert_entry() calling convention From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:01 -0700 Message-ID: <166579188180.2236710.6959794790871279054.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791885; a=rsa-sha256; cv=none; b=Ari9oKnO775fSkmdBR7L8sCYCSqfEKGaWiOidR5uPVUuK6sR1u37Rw4DkPEs/Ey/OQcvhO 3c4Zg0h0EYzT0dmpyd5V2/GxmmKEqj112ql7EEMc4HK9bmOW+o/CH/FcnY6UWTh6Mx2A32 VZfhZTJMlKDbaujn4g7rMq1ZWOUHKT4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GDXAuEcg; spf=pass (imf16.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791885; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pCTL1TvcU/jDiAc+PByWLJG6eAtYQKAD7xTXXIqNabk=; b=24ttO8JwwZzcUtKkbqHnR/TrMGApB4byl/ClxJgzPWpDa4kEaWvZk89+0Q0KQqycG5D+Gw sjAucOCLh61jfh5VWlqTOgKnPtWf78iJI8jllTmQ89L0FFVUQM1HqlterlEDrug5V52J7t pYdWt9YW9IDFoEPziRdniDEJ26bVLRM= X-Stat-Signature: rakj78kscngcybgk3bkjbemcjeaaypre X-Rspamd-Queue-Id: 7312A180010 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GDXAuEcg; spf=pass (imf16.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam06 X-HE-Tag: 1665791885-109413 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the determination of @dirty and @cow in dax_insert_entry() to flags (DAX_DIRTY and DAX_COW) that are passed in. This allows dax_insert_entry() to not require a 'struct iomap' which is a pre-requisitie for reusing dax_insert_entry() for device-dax. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 44 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 095c9d7b4a1d..73e510ca5a70 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -75,12 +75,20 @@ fs_initcall(init_dax_wait_table); * block allocation. */ #define DAX_SHIFT (5) +#define DAX_MASK ((1UL << DAX_SHIFT) - 1) #define DAX_LOCKED (1UL << 0) #define DAX_PMD (1UL << 1) #define DAX_ZERO_PAGE (1UL << 2) #define DAX_EMPTY (1UL << 3) #define DAX_ZAP (1UL << 4) +/* + * These flags are not conveyed in Xarray value entries, they are just + * modifiers to dax_insert_entry(). + */ +#define DAX_DIRTY (1UL << (DAX_SHIFT + 0)) +#define DAX_COW (1UL << (DAX_SHIFT + 1)) + static unsigned long dax_to_pfn(void *entry) { return xa_to_value(entry) >> DAX_SHIFT; @@ -88,7 +96,8 @@ static unsigned long dax_to_pfn(void *entry) static void *dax_make_entry(pfn_t pfn, unsigned long flags) { - return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT)); + return xa_mk_value((flags & DAX_MASK) | + (pfn_t_to_pfn(pfn) << DAX_SHIFT)); } static bool dax_is_locked(void *entry) @@ -932,6 +941,20 @@ static bool dax_fault_is_cow(const struct iomap_iter *iter) (iter->iomap.flags & IOMAP_F_SHARED); } +static unsigned long dax_iter_flags(const struct iomap_iter *iter, + struct vm_fault *vmf) +{ + unsigned long flags = 0; + + if (!dax_fault_is_synchronous(iter, vmf->vma)) + flags |= DAX_DIRTY; + + if (dax_fault_is_cow(iter)) + flags |= DAX_COW; + + return flags; +} + /* * By this point grab_mapping_entry() has ensured that we have a locked entry * of the appropriate size so we don't have to worry about downgrading PMDs to @@ -940,13 +963,13 @@ static bool dax_fault_is_cow(const struct iomap_iter *iter) * appropriate. */ static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, - const struct iomap_iter *iter, void **pentry, - pfn_t pfn, unsigned long flags) + void **pentry, pfn_t pfn, + unsigned long flags) { struct address_space *mapping = vmf->vma->vm_file->f_mapping; void *new_entry = dax_make_entry(pfn, flags); - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); - bool cow = dax_fault_is_cow(iter); + bool dirty = flags & DAX_DIRTY; + bool cow = flags & DAX_COW; void *entry = *pentry; vm_fault_t ret = 0; @@ -1245,7 +1268,8 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, struct vm_fault *vmf, pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr)); vm_fault_t ret; - ret = dax_insert_entry(xas, vmf, iter, entry, pfn, DAX_ZERO_PAGE); + ret = dax_insert_entry(xas, vmf, entry, pfn, + DAX_ZERO_PAGE | dax_iter_flags(iter, vmf)); if (ret) goto out; @@ -1276,8 +1300,9 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, goto fallback; pfn = page_to_pfn_t(zero_page); - ret = dax_insert_entry(xas, vmf, iter, entry, pfn, - DAX_PMD | DAX_ZERO_PAGE); + ret = dax_insert_entry(xas, vmf, entry, pfn, + DAX_PMD | DAX_ZERO_PAGE | + dax_iter_flags(iter, vmf)); if (ret) return ret; @@ -1659,7 +1684,8 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); } - ret = dax_insert_entry(xas, vmf, iter, entry, pfn, entry_flags); + ret = dax_insert_entry(xas, vmf, entry, pfn, + entry_flags | dax_iter_flags(iter, vmf)); dax_read_unlock(id); if (ret) return ret; From patchwork Fri Oct 14 23:58:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 294C7C433FE for ; Fri, 14 Oct 2022 23:58:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B12066B0072; Fri, 14 Oct 2022 19:58:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A739B6B0075; Fri, 14 Oct 2022 19:58:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EC3F6B007B; Fri, 14 Oct 2022 19:58:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7CF566B0072 for ; Fri, 14 Oct 2022 19:58:11 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 534C21204EC for ; Fri, 14 Oct 2022 23:58:11 +0000 (UTC) X-FDA: 80021221182.12.D81A8C2 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf16.hostedemail.com (Postfix) with ESMTP id B4FAB18001E for ; Fri, 14 Oct 2022 23:58:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791889; x=1697327889; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b6C7Qe8Q1GJpUot5lHXs4yvkjjX4IgtewPJJUoytmac=; b=ZkpXAICss5hHzkbPvITbhzrZBDmhf1DmTWw/5kf6Sl3mKymXhDWiLAFr aIKPwOHk1LoePp6AW1D1QOihyEu4LsOZRUU0/udXhNLGCMReKANdWcF06 dCcWJ5N2aOST86I5Q3FR2z4tVwIZogIwlUerR7pFgEqWqd3F8Nh5NcJX3 hmbKsjDiGTFIkKZfp7B0+rp4Cwh4WFQw9+eq6ou0BjnmuoXikZak+ZlA3 kDwwVYtmQNVFitlZOAYqOdPHfJiw0KzkkmyHViRW3tKis7vTXrSzHNtNn VRS03lTKCoZU9/CPqKu9z6FH9yolH09O6nnWwAatLRds843Axc3bsdder Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292862047" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292862047" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:09 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113423" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113423" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:08 -0700 Subject: [PATCH v3 12/25] fsdax: Cleanup dax_associate_entry() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:07 -0700 Message-ID: <166579188791.2236710.2590200540568819339.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791889; a=rsa-sha256; cv=none; b=p6WmIFa5w6eYxM/ImgIMWgbopYGKWnEQQ/xwgHi73kCafay8vIt7X+WCztdeUbZ1IiN4oC MnbX3SsK3I2KqIkYGM2S6byIr6zkuxSnRpnReCRYa5UrqhTT7dUt5mhiqHjzQ9mbOwJxqx DEMlz3TnJEBlRCt8pfSP6uki1fTbv5Q= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ZkpXAICs; spf=pass (imf16.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=btXBtxcbjCcTlZk4yf7Gaea1dGkef3rJyQBZS8dmASA=; b=pTNZXw5CZxI+w5KYo/2x0E9OHjLuDzLAHCpCLJJGWuDgpmPGcQBekYFH+AKpPRtiwWVqTA MNaAb2WyPhMb3Ezv7u5wDph19JiNeu5nZMDpYqtawmjbFsLDAbNcd2W08VCi/7SjXnPv9c bmS7z1VEI0Kz0DYwpeG8x3rvzOYDC+o= X-Stat-Signature: xc6zote89h9u7mjhpxfpyhitbganei3s X-Rspamd-Queue-Id: B4FAB18001E X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ZkpXAICs; spf=pass (imf16.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam06 X-HE-Tag: 1665791889-292694 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass @vmf to drop the separate @vma and @address arguments to dax_associate_entry(), use the existing DAX flags to convey the @cow argument, and replace the open-coded ALIGN(). Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- fs/dax.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 73e510ca5a70..48bc43c0c03c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -406,8 +406,7 @@ static struct dev_pagemap *folio_pgmap(struct folio *folio) */ static vm_fault_t dax_associate_entry(void *entry, struct address_space *mapping, - struct vm_area_struct *vma, - unsigned long address, bool cow) + struct vm_fault *vmf, unsigned long flags) { unsigned long size = dax_entry_size(entry), index; struct folio *folio; @@ -416,9 +415,9 @@ static vm_fault_t dax_associate_entry(void *entry, if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return 0; - index = linear_page_index(vma, address & ~(size - 1)); + index = linear_page_index(vmf->vma, ALIGN(vmf->address, size)); dax_for_each_folio(entry, folio, i) - if (cow) { + if (flags & DAX_COW) { dax_mapping_set_cow(folio); } else { WARN_ON_ONCE(folio->mapping); @@ -992,8 +991,7 @@ static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, void *old; dax_disassociate_entry(entry, mapping, false); - ret = dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, - cow); + ret = dax_associate_entry(new_entry, mapping, vmf, flags); if (ret) goto out; /* From patchwork Fri Oct 14 23:58:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B51BC4332F for ; Fri, 14 Oct 2022 23:58:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5BF76B0075; Fri, 14 Oct 2022 19:58:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0BBA6B007B; Fri, 14 Oct 2022 19:58:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD32B6B0080; Fri, 14 Oct 2022 19:58:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AB07F6B0075 for ; Fri, 14 Oct 2022 19:58:17 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 87C93C0EE9 for ; Fri, 14 Oct 2022 23:58:17 +0000 (UTC) X-FDA: 80021221434.03.968224F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf28.hostedemail.com (Postfix) with ESMTP id BE2C8C0025 for ; Fri, 14 Oct 2022 23:58:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791896; x=1697327896; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OlpG3VRQCEZ3qIcp9NWr5Pa3mRwDZ2m4hWtsSsuZz64=; b=lio0Y68O1C9r86Q1iN3iGRzKMR4lLkF0crYiAxJycbNXzBQX4V8zHAF6 izmYKPEqQwWaFlsRo4hNiwQT3cAobfn3KMyM27uypvnLgJKja8lGjrEPf WNhPzba4Pz7bAR9i8HSjEJZV9H3SV6lve4i8hARvcL42gFMsLCLzsSmI8 OSwvzZ6XkXbBfeJxW2++cbs/+6fJBQhXu9btF6i9l0JpvlO8OxcR7h3C/ k2WvnYOfrk9Ztl+v8sY19tv6RTKFZUByH1CYAI6HcP2sCPaAQM8cU+e50 jjZSCXyl5WKPgfSNdEe9/Jz0hG8gwFqjqTR4CemuLSeytKUTsVStcuORg Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="305485739" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="305485739" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:15 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113460" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113460" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:14 -0700 Subject: [PATCH v3 13/25] devdax: Minor warning fixups From: Dan Williams To: linux-mm@kvack.org Cc: david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:14 -0700 Message-ID: <166579189421.2236710.9800883307049200257.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lio0Y68O; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791897; a=rsa-sha256; cv=none; b=aAeTGtFidg1dETDlm629R6QGJ0hM2OxfH/xIxLqrzNXnKQouzBiCaqm/lgJCdozWnLda5M UHh32tWmt3eDT+szvEiPjJJfxuMFY0se4UnHy0SldC4CvTGP8GuaRQQEZYgiw3muC7DywN ZGpBoNDGPM8nA4rQw1UOUOsFtV/ZAzU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v05IosCNqJNFc5WZNd75kkfaDrA32p/5jG35re+aPa8=; b=AbfHUA5qBU9TxSqYmjmYNoNfPItj30uyT1j9NpNYjbCq7OY75+akAjS6oZVnRMQs+6ZqAB tBc+7RFC7Hto/jg8V+nr5MEozqD89J8oO1m4hR4PuZwJhdeyv7wbvUUXnnRElzLKfrzlqx dX/1SfRqkdqxZ8sjjj9ofRlE1XQfpFc= X-Rspam-User: X-Stat-Signature: xrkq5g9qhi3w153iwzj19soz3tdynku7 X-Rspamd-Queue-Id: BE2C8C0025 Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lio0Y68O; spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-HE-Tag: 1665791896-747461 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Fix a missing prototype warning for dev_dax_probe(), and fix dax_holder() comment block format. These are holdover fixes that are now being addressed as some fsdax infrastructure is being moved into the dax core. Signed-off-by: Dan Williams --- drivers/dax/dax-private.h | 1 + drivers/dax/super.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 1c974b7caae6..202cafd836e8 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -87,6 +87,7 @@ static inline struct dax_mapping *to_dax_mapping(struct device *dev) } phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size); +int dev_dax_probe(struct dev_dax *dev_dax); #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline bool dax_align_valid(unsigned long align) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 9b5e2a5eb0ae..4909ad945a49 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -475,7 +475,7 @@ EXPORT_SYMBOL_GPL(put_dax); /** * dax_holder() - obtain the holder of a dax device * @dax_dev: a dax_device instance - + * * Return: the holder's data which represents the holder if registered, * otherwize NULL. */ From patchwork Fri Oct 14 23:58:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E8B9C433FE for ; Fri, 14 Oct 2022 23:58:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1DE16B007B; Fri, 14 Oct 2022 19:58:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCE076B007D; Fri, 14 Oct 2022 19:58:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C97066B0080; Fri, 14 Oct 2022 19:58:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B8C7E6B007B for ; Fri, 14 Oct 2022 19:58:23 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8CB201202A4 for ; Fri, 14 Oct 2022 23:58:23 +0000 (UTC) X-FDA: 80021221686.04.A7B8C82 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf18.hostedemail.com (Postfix) with ESMTP id EFB3A1C002E for ; Fri, 14 Oct 2022 23:58:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791903; x=1697327903; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K1aq+/q+FuOiZcIjOjX1lseJ32olR7LP6wPKxMz7epE=; b=UrrbnQnCjfG9Mv9/tRt/avZnmhqzQDYB9JRO7kr7NetBrpnHGhcbtvLa CbOZ59i5nNzrBMLJ7NZa20wYW6GX2a99y7l75AEK0u6Nx9SqzebGt/zdO ibDnAHXyVG9vAauqWHHdSYRH0336feUCsbHKPfZHjhl4M3rskYV8YZftK OHo+lbY+/wmZc/QBlSBorELjEUmKGs9SHcnl7XT/VyGIOSKTRBexZ6AFd XUZp2wGuyHaXGRUowDV+uVbmaO8zOLMyiK/2f6HA2DJI8DWAZTWsfH9fB gra/v6+G5nuaA4a1UCvq0sHcdzBTtNzvBMAwHu/96JyVzVZd2hljIAnsm g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="391802471" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="391802471" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:21 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113492" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113492" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:20 -0700 Subject: [PATCH v3 14/25] devdax: Fix sparse lock imbalance warning From: Dan Williams To: linux-mm@kvack.org Cc: kernel test robot , david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:20 -0700 Message-ID: <166579190012.2236710.846739337067413538.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=UrrbnQnC; spf=pass (imf18.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791903; a=rsa-sha256; cv=none; b=xZl6bT2fJ13O659ZML36hlUmludGo+EAeOW+dZn7v34lD/9wcyJ6OklsYt4JmfbTEivPiq FBlrfsNP9v0/wqtWAQud0z0N92s8dJfohUYYDjAWCxjh+Zl9fnW/ruu2k/vJ4ucq6dXsX+ pIImPtpwRfxO75+wQh6cgxAc1ptYuho= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UnjebZAuCbgnprXpdgXqU6XmMa/hB1k8NyCRvHAm6xw=; b=LEfqN5VHkQsy0+AxLq7e3E9dI5upDgauqR0zbK+KMJY6prcE4By3r4QpZaEt+pTtU33Cqj 0emIEEBypkU/1xI2nuJKwEH+UuXTYe/G4fXnhmKfIBYE60Qx8E0nylo/cVA4vPMg/1AOHL OsUDvu/LN/KQKpq0X5LNuTM5OKEp00U= X-Rspamd-Server: rspam05 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=UrrbnQnC; spf=pass (imf18.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: 34gt8dfx6eqhsrctqunrbhqjp1pyudtd X-Rspamd-Queue-Id: EFB3A1C002E X-HE-Tag: 1665791902-795430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Annotate dax_read_{lock,unlock} with their locking expectations to fix this sparse report: drivers/dax/super.c:45:5: sparse: warning: context imbalance in 'dax_read_lock' - wrong count at exit drivers/dax/super.c: note: in included file (through include/linux/notifier.h, include/linux/memory_hotplug.h, include/linux/mmzone.h, include/linux/gfp.h, include/linux/mm.h, include/linux/pagemap.h): ./include/linux/srcu.h:189:9: sparse: warning: context imbalance in 'dax_read_unlock' - unexpected unlock Reported-by: kernel test robot Link: http://lore.kernel.org/r/202210091141.cHaQEuCs-lkp@intel.com Signed-off-by: Dan Williams --- drivers/dax/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 4909ad945a49..41342e47662d 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -42,13 +42,13 @@ static DEFINE_IDA(dax_minor_ida); static struct kmem_cache *dax_cache __read_mostly; static struct super_block *dax_superblock __read_mostly; -int dax_read_lock(void) +int dax_read_lock(void) __acquires(&dax_srcu) { return srcu_read_lock(&dax_srcu); } EXPORT_SYMBOL_GPL(dax_read_lock); -void dax_read_unlock(int id) +void dax_read_unlock(int id) __releases(&dax_srcu) { srcu_read_unlock(&dax_srcu, id); } From patchwork Fri Oct 14 23:58:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 922D1C4332F for ; Fri, 14 Oct 2022 23:58:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FADF6B0075; Fri, 14 Oct 2022 19:58:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AA896B007D; Fri, 14 Oct 2022 19:58:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 172196B007E; Fri, 14 Oct 2022 19:58:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 04DCD6B0075 for ; Fri, 14 Oct 2022 19:58:29 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CA0411206F8 for ; Fri, 14 Oct 2022 23:58:28 +0000 (UTC) X-FDA: 80021221896.17.47E233A Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf23.hostedemail.com (Postfix) with ESMTP id 57BC0140032 for ; Fri, 14 Oct 2022 23:58:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791908; x=1697327908; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C/qZLnXcKAFdw6v16/JSbSW5MEboRqnBqKQbL8OhQjQ=; b=Pcok9azHciWzoDwk5sKElNCiUZYxFoHLS58m6ztL6WJuX3tRiDQU6V7W BZIEl9z/kmYzMBN2XyAEkD2kTRN5LJpM1y4gtkiuZ+/nQFdPfRaw2za/Y 5nCC4FJRaeNWm82p3f2TX1JrYWUWoskZqRZXOOhW0ld0Im/wwVTWtfq2c Fyqy77W/clyE8/ZPfLJ2UBmEIfzAsSSnEWG1PRWTCEzT5n8a1kNoaX3Xh ZWNqjMdwCHYhTK7fz54NiYEiYTKcMBkW0EZ+XCGoKbzVI01PlXmaoK/h8 WrvZl/kACqG7BfBJcofNzrdII0RColqb4GoeARDSWmuLRfDYh5Zs4z2zN w==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="367523236" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="367523236" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:27 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="630113510" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="630113510" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:26 -0700 Subject: [PATCH v3 15/25] libnvdimm/pmem: Support pmem block devices without dax From: Dan Williams To: linux-mm@kvack.org Cc: david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:26 -0700 Message-ID: <166579190607.2236710.1230996282258115812.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Pcok9azH; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791908; a=rsa-sha256; cv=none; b=cuZ9NduH/iW34+NLmteylpsZfaBaL1bIsknveAiqYKOHnf5h4Sgio+PvkC8NgKoX/QiIoD o56VSNubReB4zBvRXtDk3E5VVDuXmQ6x08qiYfD3vUhPh15XwUFh/ucyiseT+EsMjYs+Tj 2Sso1hZEb2SOFVAkEey/eijTPsUtIs8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a9trwZBc+KeRM7FLcSeop9grDbZqq5tr+AizBEGe0mI=; b=y+q5DO2wVKHioyETRd0BTYDeY6OPVF5ytpYFUOmaZ+678Yco+WeZPipxql3qXetaHj8K7U Uu+Gyvgu79S1SI9by8NwhdUBJFfGBO6JFPlPoLByxCSri8r2uPiGF+Qm00OuOf+lIlRYNI 67xdqRbpVW4CLAffn2PeIodIhkSrKZU= X-Rspamd-Server: rspam05 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Pcok9azH; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: uqb6t189waxtgn8fxi5q6mw9x11ps4yd X-Rspamd-Queue-Id: 57BC0140032 X-HE-Tag: 1665791908-62937 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for CONFIG_DAX growing a CONFIG_MMU dependency add support for pmem to skip dax-device registration in the CONFIG_DAX=n case. alloc_dax() returns NULL in the CONFIG_DAX=n case, ERR_PTR() in the failure case, and a dax-device in the success case. dax_remove_host(), kill_dax() and put_dax() are safe to call if setup_dax() returns 0 because it suceeded, or 0 because CONFIG_DAX=n. Signed-off-by: Dan Williams --- drivers/nvdimm/Kconfig | 2 +- drivers/nvdimm/pmem.c | 47 ++++++++++++++++++++++++++++++----------------- 2 files changed, 31 insertions(+), 18 deletions(-) diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig index 5a29046e3319..027acca1bac4 100644 --- a/drivers/nvdimm/Kconfig +++ b/drivers/nvdimm/Kconfig @@ -19,7 +19,7 @@ if LIBNVDIMM config BLK_DEV_PMEM tristate "PMEM: Persistent memory block device support" default LIBNVDIMM - select DAX + select DAX if MMU select ND_BTT if BTT select ND_PFN if NVDIMM_PFN help diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 7e88cd242380..068183ee9bf6 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -468,6 +468,32 @@ static const struct dev_pagemap_ops fsdax_pagemap_ops = { .memory_failure = pmem_pagemap_memory_failure, }; +static int setup_dax(struct pmem_device *pmem, struct gendisk *disk, + struct nd_region *nd_region) +{ + struct dax_device *dax_dev; + int rc; + + dax_dev = alloc_dax(pmem, &pmem_dax_ops); + if (IS_ERR(dax_dev)) + return PTR_ERR(dax_dev); + if (!dax_dev) + return 0; + set_dax_nocache(dax_dev); + set_dax_nomc(dax_dev); + if (is_nvdimm_sync(nd_region)) + set_dax_synchronous(dax_dev); + rc = dax_add_host(dax_dev, disk); + if (rc) { + kill_dax(dax_dev); + put_dax(dax_dev); + return rc; + } + dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); + pmem->dax_dev = dax_dev; + return 0; +} + static int pmem_attach_disk(struct device *dev, struct nd_namespace_common *ndns) { @@ -477,7 +503,6 @@ static int pmem_attach_disk(struct device *dev, struct resource *res = &nsio->res; struct range bb_range; struct nd_pfn *nd_pfn = NULL; - struct dax_device *dax_dev; struct nd_pfn_sb *pfn_sb; struct pmem_device *pmem; struct request_queue *q; @@ -578,24 +603,13 @@ static int pmem_attach_disk(struct device *dev, nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range); disk->bb = &pmem->bb; - dax_dev = alloc_dax(pmem, &pmem_dax_ops); - if (IS_ERR(dax_dev)) { - rc = PTR_ERR(dax_dev); - goto out; - } - set_dax_nocache(dax_dev); - set_dax_nomc(dax_dev); - if (is_nvdimm_sync(nd_region)) - set_dax_synchronous(dax_dev); - rc = dax_add_host(dax_dev, disk); + rc = setup_dax(pmem, disk, nd_region); if (rc) - goto out_cleanup_dax; - dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); - pmem->dax_dev = dax_dev; + goto out; rc = device_add_disk(dev, disk, pmem_attribute_groups); if (rc) - goto out_remove_host; + goto out_dax; if (devm_add_action_or_reset(dev, pmem_release_disk, pmem)) return -ENOMEM; @@ -607,9 +621,8 @@ static int pmem_attach_disk(struct device *dev, dev_warn(dev, "'badblocks' notification disabled\n"); return 0; -out_remove_host: +out_dax: dax_remove_host(pmem->disk); -out_cleanup_dax: kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); out: From patchwork Fri Oct 14 23:58:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007530 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38A43C433FE for ; Fri, 14 Oct 2022 23:58:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFBE28E0001; Fri, 14 Oct 2022 19:58:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDD576B0080; Fri, 14 Oct 2022 19:58:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B1488E0001; Fri, 14 Oct 2022 19:58:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 80A8C6B007E for ; Fri, 14 Oct 2022 19:58:35 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5B0AA1C6569 for ; Fri, 14 Oct 2022 23:58:35 +0000 (UTC) X-FDA: 80021222190.28.AC043F9 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf22.hostedemail.com (Postfix) with ESMTP id A9C06C0032 for ; Fri, 14 Oct 2022 23:58:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791914; x=1697327914; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gtb1OH6xP0NcknDSp/QI1HrVmy2AMK82UNC/lm6mdzg=; b=GBVeI/g3dmt6h1k0RgGpdPM2aWX5hG3VRfHbgaj84MIA0a3mv/q9bf2E YVIhPsN84sIjD4RBUZ6HUUqW9VYwt1mJbZ6dfjM86i/kEx3a/1OLdkapx XojT6WPmNErYJdw5sLIO39l7XSjB78Bl2W74oSf369BQp4KWneUcjnV88 PW8epQv/cnXsK9rSb5iC+UTYN+Obo5sYP62xF90eEwBHYAZB1ndEsFz3f k3m+EEpk5QXvsswVblFYLf6uOPXnO6xrKzND0K9yohgbVQNSqLTeko+Hv s1IniV0mP+dvg6XJIpNvwb4bPnebBojaIbChjpJ67zmlkDIWxuekpPcUe Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="305485788" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="305485788" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:33 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798905" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798905" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:32 -0700 Subject: [PATCH v3 16/25] devdax: Move address_space helpers to the DAX core From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:32 -0700 Message-ID: <166579191217.2236710.8295835680465098304.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="GBVeI/g3"; spf=pass (imf22.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791915; a=rsa-sha256; cv=none; b=Wa1dQdtMGvTOAr3YXfb2d1n7a4L1PpehGRsprUCNxhYdovgJFDrDYKRxRhFzSvEFqi2Pzp UcOTmBcfcpbjJSfIMhxYipXETSr8mJWcSAieQO1FhBDmgZ0hKHQeUgCxqLCLiaQyvWUHO0 bw38RsQ78lrW9fXwii19S79RsSe9Cz8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791915; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ILC6DqWSJVxv1ZwfbXeLgYwryktMeZRFuh7QbBIaabo=; b=EMXT1dKTgW4LL48MpX0ARD0jM12Xg8BJ6NnG2g3RYsncub/NlHGP2L3vRlcc6+SkpQCuKJ pRSVMLZ6fACx2pLT/HiJWRg8h9lXgEbDVs08HQXyMzbLQrsXw3gxIxhqJ6XkgjfRq1bc9n wpEOO671EwTnEdA7BD9TmvM07IZKq/s= X-Rspamd-Queue-Id: A9C06C0032 X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="GBVeI/g3"; spf=pass (imf22.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam04 X-Stat-Signature: a4rqy3t6mni7yner8s7e9dio4h9q4pw1 X-HE-Tag: 1665791914-612311 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for decamping get_dev_pagemap() and put_devmap_managed_page() from code paths outside of DAX, device-dax needs to track mapping references similar to the tracking done for fsdax. Reuse the same infrastructure as fsdax (dax_insert_entry() and dax_delete_mapping_entry()). For now, just move that infrastructure into a common location with no other code changes. The move involves splitting iomap and supporting helpers into fs/dax.c and all 'struct address_space' and DAX-entry manipulation into drivers/dax/mapping.c. grab_mapping_entry() is renamed dax_grab_mapping_entry(), and some common definitions and declarations are moved to include/linux/dax.h. No functional change is intended, just code movement. The interactions between drivers/dax/mapping.o and mm/memory-failure.o result in drivers/dax/mapping.o and the rest of the dax core losing its option to be compiled as a module. That can be addressed later given the fact the CONFIG_FS_DAX has always been forcing the dax core to be built-in. I.e. this is only a potential vmlinux size regression for CONFIG_FS_DAX=n and CONFIG_DEV_DAX=m builds which are not common. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- drivers/Makefile | 2 drivers/dax/Kconfig | 4 drivers/dax/Makefile | 1 drivers/dax/dax-private.h | 1 drivers/dax/mapping.c | 1048 +++++++++++++++++++++++++++++++++++++++++ drivers/dax/super.c | 4 drivers/nvdimm/Kconfig | 1 fs/dax.c | 1143 +-------------------------------------------- include/linux/dax.h | 113 +++- include/linux/memremap.h | 6 10 files changed, 1183 insertions(+), 1140 deletions(-) create mode 100644 drivers/dax/mapping.c diff --git a/drivers/Makefile b/drivers/Makefile index 057857258bfd..ec6c4146b966 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -71,7 +71,7 @@ obj-$(CONFIG_FB_INTEL) += video/fbdev/intelfb/ obj-$(CONFIG_PARPORT) += parport/ obj-y += base/ block/ misc/ mfd/ nfc/ obj-$(CONFIG_LIBNVDIMM) += nvdimm/ -obj-$(CONFIG_DAX) += dax/ +obj-y += dax/ obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ obj-$(CONFIG_NUBUS) += nubus/ obj-y += cxl/ diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index 5fdf269a822e..205e9dda8928 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -1,8 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only menuconfig DAX - tristate "DAX: direct access to differentiated memory" + bool "DAX: direct access to differentiated memory" + depends on MMU select SRCU - default m if NVDIMM_DAX if DAX diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile index 90a56ca3b345..3546bca7adbf 100644 --- a/drivers/dax/Makefile +++ b/drivers/dax/Makefile @@ -6,6 +6,7 @@ obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o dax-y := super.o dax-y += bus.o +dax-y += mapping.o device_dax-y := device.o dax_pmem-y := pmem.o diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 202cafd836e8..19076f9d5c51 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -15,6 +15,7 @@ struct dax_device *inode_dax(struct inode *inode); struct inode *dax_inode(struct dax_device *dax_dev); int dax_bus_init(void); void dax_bus_exit(void); +void dax_mapping_init(void); /** * struct dax_region - mapping infrastructure for dax devices diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c new file mode 100644 index 000000000000..19121b7421fb --- /dev/null +++ b/drivers/dax/mapping.c @@ -0,0 +1,1048 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Direct Access mapping infrastructure split from fs/dax.c + * Copyright (c) 2013-2014 Intel Corporation + * Author: Matthew Wilcox + * Author: Ross Zwisler + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "dax-private.h" + +#define CREATE_TRACE_POINTS +#include + +/* We choose 4096 entries - same as per-zone page wait tables */ +#define DAX_WAIT_TABLE_BITS 12 +#define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS) + +static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES]; + +void __init dax_mapping_init(void) +{ + int i; + + for (i = 0; i < DAX_WAIT_TABLE_ENTRIES; i++) + init_waitqueue_head(wait_table + i); +} + +static unsigned long dax_to_pfn(void *entry) +{ + return xa_to_value(entry) >> DAX_SHIFT; +} + +static void *dax_make_entry(pfn_t pfn, unsigned long flags) +{ + return xa_mk_value((flags & DAX_MASK) | + (pfn_t_to_pfn(pfn) << DAX_SHIFT)); +} + +static bool dax_is_locked(void *entry) +{ + return xa_to_value(entry) & DAX_LOCKED; +} + +static bool dax_is_zapped(void *entry) +{ + return xa_to_value(entry) & DAX_ZAP; +} + +static unsigned int dax_entry_order(void *entry) +{ + if (xa_to_value(entry) & DAX_PMD) + return PMD_ORDER; + return 0; +} + +static unsigned long dax_is_pmd_entry(void *entry) +{ + return xa_to_value(entry) & DAX_PMD; +} + +static bool dax_is_pte_entry(void *entry) +{ + return !(xa_to_value(entry) & DAX_PMD); +} + +static int dax_is_zero_entry(void *entry) +{ + return xa_to_value(entry) & DAX_ZERO_PAGE; +} + +static int dax_is_empty_entry(void *entry) +{ + return xa_to_value(entry) & DAX_EMPTY; +} + +/* + * true if the entry that was found is of a smaller order than the entry + * we were looking for + */ +static bool dax_is_conflict(void *entry) +{ + return entry == XA_RETRY_ENTRY; +} + +/* + * DAX page cache entry locking + */ +struct exceptional_entry_key { + struct xarray *xa; + pgoff_t entry_start; +}; + +struct wait_exceptional_entry_queue { + wait_queue_entry_t wait; + struct exceptional_entry_key key; +}; + +/** + * enum dax_wake_mode: waitqueue wakeup behaviour + * @WAKE_ALL: wake all waiters in the waitqueue + * @WAKE_NEXT: wake only the first waiter in the waitqueue + */ +enum dax_wake_mode { + WAKE_ALL, + WAKE_NEXT, +}; + +static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas, void *entry, + struct exceptional_entry_key *key) +{ + unsigned long hash; + unsigned long index = xas->xa_index; + + /* + * If 'entry' is a PMD, align the 'index' that we use for the wait + * queue to the start of that PMD. This ensures that all offsets in + * the range covered by the PMD map to the same bit lock. + */ + if (dax_is_pmd_entry(entry)) + index &= ~PG_PMD_COLOUR; + key->xa = xas->xa; + key->entry_start = index; + + hash = hash_long((unsigned long)xas->xa ^ index, DAX_WAIT_TABLE_BITS); + return wait_table + hash; +} + +static int wake_exceptional_entry_func(wait_queue_entry_t *wait, + unsigned int mode, int sync, void *keyp) +{ + struct exceptional_entry_key *key = keyp; + struct wait_exceptional_entry_queue *ewait = + container_of(wait, struct wait_exceptional_entry_queue, wait); + + if (key->xa != ewait->key.xa || + key->entry_start != ewait->key.entry_start) + return 0; + return autoremove_wake_function(wait, mode, sync, NULL); +} + +/* + * @entry may no longer be the entry at the index in the mapping. + * The important information it's conveying is whether the entry at + * this index used to be a PMD entry. + */ +static void dax_wake_entry(struct xa_state *xas, void *entry, + enum dax_wake_mode mode) +{ + struct exceptional_entry_key key; + wait_queue_head_t *wq; + + wq = dax_entry_waitqueue(xas, entry, &key); + + /* + * Checking for locked entry and prepare_to_wait_exclusive() happens + * under the i_pages lock, ditto for entry handling in our callers. + * So at this point all tasks that could have seen our entry locked + * must be in the waitqueue and the following check will see them. + */ + if (waitqueue_active(wq)) + __wake_up(wq, TASK_NORMAL, mode == WAKE_ALL ? 0 : 1, &key); +} + +/* + * Look up entry in page cache, wait for it to become unlocked if it + * is a DAX entry and return it. The caller must subsequently call + * put_unlocked_entry() if it did not lock the entry or dax_unlock_entry() + * if it did. The entry returned may have a larger order than @order. + * If @order is larger than the order of the entry found in i_pages, this + * function returns a dax_is_conflict entry. + * + * Must be called with the i_pages lock held. + */ +static void *get_unlocked_entry(struct xa_state *xas, unsigned int order) +{ + void *entry; + struct wait_exceptional_entry_queue ewait; + wait_queue_head_t *wq; + + init_wait(&ewait.wait); + ewait.wait.func = wake_exceptional_entry_func; + + for (;;) { + entry = xas_find_conflict(xas); + if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) + return entry; + if (dax_entry_order(entry) < order) + return XA_RETRY_ENTRY; + if (!dax_is_locked(entry)) + return entry; + + wq = dax_entry_waitqueue(xas, entry, &ewait.key); + prepare_to_wait_exclusive(wq, &ewait.wait, + TASK_UNINTERRUPTIBLE); + xas_unlock_irq(xas); + xas_reset(xas); + schedule(); + finish_wait(wq, &ewait.wait); + xas_lock_irq(xas); + } +} + +/* + * The only thing keeping the address space around is the i_pages lock + * (it's cycled in clear_inode() after removing the entries from i_pages) + * After we call xas_unlock_irq(), we cannot touch xas->xa. + */ +static void wait_entry_unlocked(struct xa_state *xas, void *entry) +{ + struct wait_exceptional_entry_queue ewait; + wait_queue_head_t *wq; + + init_wait(&ewait.wait); + ewait.wait.func = wake_exceptional_entry_func; + + wq = dax_entry_waitqueue(xas, entry, &ewait.key); + /* + * Unlike get_unlocked_entry() there is no guarantee that this + * path ever successfully retrieves an unlocked entry before an + * inode dies. Perform a non-exclusive wait in case this path + * never successfully performs its own wake up. + */ + prepare_to_wait(wq, &ewait.wait, TASK_UNINTERRUPTIBLE); + xas_unlock_irq(xas); + schedule(); + finish_wait(wq, &ewait.wait); +} + +static void put_unlocked_entry(struct xa_state *xas, void *entry, + enum dax_wake_mode mode) +{ + if (entry && !dax_is_conflict(entry)) + dax_wake_entry(xas, entry, mode); +} + +/* + * We used the xa_state to get the entry, but then we locked the entry and + * dropped the xa_lock, so we know the xa_state is stale and must be reset + * before use. + */ +void dax_unlock_entry(struct xa_state *xas, void *entry) +{ + void *old; + + WARN_ON(dax_is_locked(entry)); + xas_reset(xas); + xas_lock_irq(xas); + old = xas_store(xas, entry); + xas_unlock_irq(xas); + WARN_ON(!dax_is_locked(old)); + dax_wake_entry(xas, entry, WAKE_NEXT); +} + +/* + * Return: The entry stored at this location before it was locked. + */ +static void *dax_lock_entry(struct xa_state *xas, void *entry) +{ + unsigned long v = xa_to_value(entry); + + return xas_store(xas, xa_mk_value(v | DAX_LOCKED)); +} + +static unsigned long dax_entry_size(void *entry) +{ + if (dax_is_zero_entry(entry)) + return 0; + else if (dax_is_empty_entry(entry)) + return 0; + else if (dax_is_pmd_entry(entry)) + return PMD_SIZE; + else + return PAGE_SIZE; +} + +/* + * Until fsdax constructs compound folios it needs to be prepared to + * support multiple folios per entry where each folio is a single page + */ +static struct folio *dax_entry_to_folio(void *entry, int idx) +{ + unsigned long pfn, size = dax_entry_size(entry); + struct page *page; + struct folio *folio; + + if (!size) + return NULL; + + pfn = dax_to_pfn(entry); + page = pfn_to_page(pfn); + folio = page_folio(page); + + /* + * Are there multiple folios per entry, and has the iterator + * passed the end of that set? + */ + if (idx >= size / folio_size(folio)) + return NULL; + + VM_WARN_ON_ONCE(!IS_ALIGNED(size, folio_size(folio))); + + return page_folio(page + idx); +} + +/* + * Iterate through all folios associated with a given entry + */ +#define dax_for_each_folio(entry, folio, i) \ + for (i = 0, folio = dax_entry_to_folio(entry, i); folio; \ + folio = dax_entry_to_folio(entry, ++i)) + +static bool dax_mapping_is_cow(struct address_space *mapping) +{ + return (unsigned long)mapping == PAGE_MAPPING_DAX_COW; +} + +/* + * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount. + */ +static void dax_mapping_set_cow(struct folio *folio) +{ + if ((uintptr_t)folio->mapping != PAGE_MAPPING_DAX_COW) { + /* + * Reset the index if the folio was already mapped + * regularly before. + */ + if (folio->mapping) + folio->index = 1; + folio->mapping = (void *)PAGE_MAPPING_DAX_COW; + } + folio->index++; +} + +static struct dev_pagemap *folio_pgmap(struct folio *folio) +{ + return folio_page(folio, 0)->pgmap; +} + +/* + * When it is called in dax_insert_entry(), the cow flag will indicate that + * whether this entry is shared by multiple files. If so, set the page->mapping + * FS_DAX_MAPPING_COW, and use page->index as refcount. + */ +static vm_fault_t dax_associate_entry(void *entry, + struct address_space *mapping, + struct vm_fault *vmf, unsigned long flags) +{ + unsigned long size = dax_entry_size(entry), index; + struct folio *folio; + int i; + + if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) + return 0; + + index = linear_page_index(vmf->vma, ALIGN(vmf->address, size)); + dax_for_each_folio(entry, folio, i) + if (flags & DAX_COW) { + dax_mapping_set_cow(folio); + } else { + WARN_ON_ONCE(folio->mapping); + if (!pgmap_request_folios(folio_pgmap(folio), folio, 1)) + return VM_FAULT_SIGBUS; + folio->mapping = mapping; + folio->index = index + i; + } + + return 0; +} + +static void dax_disassociate_entry(void *entry, struct address_space *mapping, + bool trunc) +{ + struct folio *folio; + int i; + + if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) + return; + + dax_for_each_folio(entry, folio, i) { + if (dax_mapping_is_cow(folio->mapping)) { + /* keep the CoW flag if this folio is still shared */ + if (folio->index-- > 0) + continue; + } else { + WARN_ON_ONCE(trunc && !dax_is_zapped(entry)); + WARN_ON_ONCE(trunc && !dax_folio_idle(folio)); + WARN_ON_ONCE(folio->mapping && folio->mapping != mapping); + } + folio->mapping = NULL; + folio->index = 0; + } +} + +/* + * dax_lock_page - Lock the DAX entry corresponding to a page + * @page: The page whose entry we want to lock + * + * Context: Process context. + * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could + * not be locked. + */ +dax_entry_t dax_lock_page(struct page *page) +{ + XA_STATE(xas, NULL, 0); + void *entry; + + /* Ensure page->mapping isn't freed while we look at it */ + rcu_read_lock(); + for (;;) { + struct address_space *mapping = READ_ONCE(page->mapping); + + entry = NULL; + if (!mapping || !dax_mapping(mapping)) + break; + + /* + * In the device-dax case there's no need to lock, a + * struct dev_pagemap pin is sufficient to keep the + * inode alive, and we assume we have dev_pagemap pin + * otherwise we would not have a valid pfn_to_page() + * translation. + */ + entry = (void *)~0UL; + if (S_ISCHR(mapping->host->i_mode)) + break; + + xas.xa = &mapping->i_pages; + xas_lock_irq(&xas); + if (mapping != page->mapping) { + xas_unlock_irq(&xas); + continue; + } + xas_set(&xas, page->index); + entry = xas_load(&xas); + if (dax_is_locked(entry)) { + rcu_read_unlock(); + wait_entry_unlocked(&xas, entry); + rcu_read_lock(); + continue; + } + dax_lock_entry(&xas, entry); + xas_unlock_irq(&xas); + break; + } + rcu_read_unlock(); + return (dax_entry_t)entry; +} + +void dax_unlock_page(struct page *page, dax_entry_t cookie) +{ + struct address_space *mapping = page->mapping; + XA_STATE(xas, &mapping->i_pages, page->index); + + if (S_ISCHR(mapping->host->i_mode)) + return; + + dax_unlock_entry(&xas, (void *)cookie); +} + +/* + * dax_lock_mapping_entry - Lock the DAX entry corresponding to a mapping + * @mapping: the file's mapping whose entry we want to lock + * @index: the offset within this file + * @page: output the dax page corresponding to this dax entry + * + * Return: A cookie to pass to dax_unlock_mapping_entry() or 0 if the entry + * could not be locked. + */ +dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, pgoff_t index, + struct page **page) +{ + XA_STATE(xas, NULL, 0); + void *entry; + + rcu_read_lock(); + for (;;) { + entry = NULL; + if (!dax_mapping(mapping)) + break; + + xas.xa = &mapping->i_pages; + xas_lock_irq(&xas); + xas_set(&xas, index); + entry = xas_load(&xas); + if (dax_is_locked(entry)) { + rcu_read_unlock(); + wait_entry_unlocked(&xas, entry); + rcu_read_lock(); + continue; + } + if (!entry || dax_is_zero_entry(entry) || + dax_is_empty_entry(entry)) { + /* + * Because we are looking for entry from file's mapping + * and index, so the entry may not be inserted for now, + * or even a zero/empty entry. We don't think this is + * an error case. So, return a special value and do + * not output @page. + */ + entry = (void *)~0UL; + } else { + *page = pfn_to_page(dax_to_pfn(entry)); + dax_lock_entry(&xas, entry); + } + xas_unlock_irq(&xas); + break; + } + rcu_read_unlock(); + return (dax_entry_t)entry; +} + +void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index, + dax_entry_t cookie) +{ + XA_STATE(xas, &mapping->i_pages, index); + + if (cookie == ~0UL) + return; + + dax_unlock_entry(&xas, (void *)cookie); +} + +/* + * Find page cache entry at given index. If it is a DAX entry, return it + * with the entry locked. If the page cache doesn't contain an entry at + * that index, add a locked empty entry. + * + * When requesting an entry with size DAX_PMD, dax_grab_mapping_entry() will + * either return that locked entry or will return VM_FAULT_FALLBACK. + * This will happen if there are any PTE entries within the PMD range + * that we are requesting. + * + * We always favor PTE entries over PMD entries. There isn't a flow where we + * evict PTE entries in order to 'upgrade' them to a PMD entry. A PMD + * insertion will fail if it finds any PTE entries already in the tree, and a + * PTE insertion will cause an existing PMD entry to be unmapped and + * downgraded to PTE entries. This happens for both PMD zero pages as + * well as PMD empty entries. + * + * The exception to this downgrade path is for PMD entries that have + * real storage backing them. We will leave these real PMD entries in + * the tree, and PTE writes will simply dirty the entire PMD entry. + * + * Note: Unlike filemap_fault() we don't honor FAULT_FLAG_RETRY flags. For + * persistent memory the benefit is doubtful. We can add that later if we can + * show it helps. + * + * On error, this function does not return an ERR_PTR. Instead it returns + * a VM_FAULT code, encoded as an xarray internal entry. The ERR_PTR values + * overlap with xarray value entries. + */ +void *dax_grab_mapping_entry(struct xa_state *xas, + struct address_space *mapping, unsigned int order) +{ + unsigned long index = xas->xa_index; + bool pmd_downgrade; /* splitting PMD entry into PTE entries? */ + void *entry; + +retry: + pmd_downgrade = false; + xas_lock_irq(xas); + entry = get_unlocked_entry(xas, order); + + if (entry) { + if (dax_is_conflict(entry)) + goto fallback; + if (!xa_is_value(entry)) { + xas_set_err(xas, -EIO); + goto out_unlock; + } + + if (order == 0) { + if (dax_is_pmd_entry(entry) && + (dax_is_zero_entry(entry) || + dax_is_empty_entry(entry))) { + pmd_downgrade = true; + } + } + } + + if (pmd_downgrade) { + /* + * Make sure 'entry' remains valid while we drop + * the i_pages lock. + */ + dax_lock_entry(xas, entry); + + /* + * Besides huge zero pages the only other thing that gets + * downgraded are empty entries which don't need to be + * unmapped. + */ + if (dax_is_zero_entry(entry)) { + xas_unlock_irq(xas); + unmap_mapping_pages(mapping, + xas->xa_index & ~PG_PMD_COLOUR, + PG_PMD_NR, false); + xas_reset(xas); + xas_lock_irq(xas); + } + + dax_disassociate_entry(entry, mapping, false); + xas_store(xas, NULL); /* undo the PMD join */ + dax_wake_entry(xas, entry, WAKE_ALL); + mapping->nrpages -= PG_PMD_NR; + entry = NULL; + xas_set(xas, index); + } + + if (entry) { + dax_lock_entry(xas, entry); + } else { + unsigned long flags = DAX_EMPTY; + + if (order > 0) + flags |= DAX_PMD; + entry = dax_make_entry(pfn_to_pfn_t(0), flags); + dax_lock_entry(xas, entry); + if (xas_error(xas)) + goto out_unlock; + mapping->nrpages += 1UL << order; + } + +out_unlock: + xas_unlock_irq(xas); + if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) + goto retry; + if (xas->xa_node == XA_ERROR(-ENOMEM)) + return xa_mk_internal(VM_FAULT_OOM); + if (xas_error(xas)) + return xa_mk_internal(VM_FAULT_SIGBUS); + return entry; +fallback: + xas_unlock_irq(xas); + return xa_mk_internal(VM_FAULT_FALLBACK); +} + +static void *dax_zap_entry(struct xa_state *xas, void *entry) +{ + unsigned long v = xa_to_value(entry); + + return xas_store(xas, xa_mk_value(v | DAX_ZAP)); +} + +/* + * Return NULL if the entry is zapped and all pages in the entry are + * idle, otherwise return the non-idle page in the entry + */ +static struct page *dax_zap_pages(struct xa_state *xas, void *entry) +{ + struct page *ret = NULL; + struct folio *folio; + bool zap; + int i; + + if (!dax_entry_size(entry)) + return NULL; + + zap = !dax_is_zapped(entry); + + dax_for_each_folio(entry, folio, i) { + if (zap) + pgmap_release_folios(folio_pgmap(folio), folio, 1); + if (!ret && !dax_folio_idle(folio)) + ret = folio_page(folio, 0); + } + + if (zap) + dax_zap_entry(xas, entry); + + return ret; +} + +/** + * dax_zap_mappings_range - find first pinned page in @mapping + * @mapping: address space to scan for a page with ref count > 1 + * @start: Starting offset. Page containing 'start' is included. + * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX, + * pages from 'start' till the end of file are included. + * + * DAX requires ZONE_DEVICE mapped pages. These pages are never + * 'onlined' to the page allocator so they are considered idle when + * page->count == 1. A filesystem uses this interface to determine if + * any page in the mapping is busy, i.e. for DMA, or other + * get_user_pages() usages. + * + * It is expected that the filesystem is holding locks to block the + * establishment of new mappings in this address_space. I.e. it expects + * to be able to run unmap_mapping_range() and subsequently not race + * mapping_mapped() becoming true. + */ +struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, + loff_t end) +{ + void *entry; + unsigned int scanned = 0; + struct page *page = NULL; + pgoff_t start_idx = start >> PAGE_SHIFT; + pgoff_t end_idx; + XA_STATE(xas, &mapping->i_pages, start_idx); + + /* + * In the 'limited' case get_user_pages() for dax is disabled. + */ + if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) + return NULL; + + if (!dax_mapping(mapping)) + return NULL; + + /* If end == LLONG_MAX, all pages from start to till end of file */ + if (end == LLONG_MAX) + end_idx = ULONG_MAX; + else + end_idx = end >> PAGE_SHIFT; + /* + * If we race get_user_pages_fast() here either we'll see the + * elevated page count in the iteration and wait, or + * get_user_pages_fast() will see that the page it took a reference + * against is no longer mapped in the page tables and bail to the + * get_user_pages() slow path. The slow path is protected by + * pte_lock() and pmd_lock(). New references are not taken without + * holding those locks, and unmap_mapping_pages() will not zero the + * pte or pmd without holding the respective lock, so we are + * guaranteed to either see new references or prevent new + * references from being established. + */ + unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0); + + xas_lock_irq(&xas); + xas_for_each(&xas, entry, end_idx) { + if (WARN_ON_ONCE(!xa_is_value(entry))) + continue; + if (unlikely(dax_is_locked(entry))) + entry = get_unlocked_entry(&xas, 0); + if (entry) + page = dax_zap_pages(&xas, entry); + put_unlocked_entry(&xas, entry, WAKE_NEXT); + if (page) + break; + if (++scanned % XA_CHECK_SCHED) + continue; + + xas_pause(&xas); + xas_unlock_irq(&xas); + cond_resched(); + xas_lock_irq(&xas); + } + xas_unlock_irq(&xas); + return page; +} +EXPORT_SYMBOL_GPL(dax_zap_mappings_range); + +struct page *dax_zap_mappings(struct address_space *mapping) +{ + return dax_zap_mappings_range(mapping, 0, LLONG_MAX); +} +EXPORT_SYMBOL_GPL(dax_zap_mappings); + +static int __dax_invalidate_entry(struct address_space *mapping, + pgoff_t index, bool trunc) +{ + XA_STATE(xas, &mapping->i_pages, index); + int ret = 0; + void *entry; + + xas_lock_irq(&xas); + entry = get_unlocked_entry(&xas, 0); + if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) + goto out; + if (!trunc && (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY) || + xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE))) + goto out; + dax_disassociate_entry(entry, mapping, trunc); + xas_store(&xas, NULL); + mapping->nrpages -= 1UL << dax_entry_order(entry); + ret = 1; +out: + put_unlocked_entry(&xas, entry, WAKE_ALL); + xas_unlock_irq(&xas); + return ret; +} + +/* + * wait indefinitely for all pins to drop, the alternative to waiting is + * a potential use-after-free scenario + */ +static void dax_break_layout(struct address_space *mapping, pgoff_t index) +{ + /* To do this without locks, the inode needs to be unreferenced */ + WARN_ON(atomic_read(&mapping->host->i_count)); + do { + struct page *page; + + page = dax_zap_mappings_range(mapping, index << PAGE_SHIFT, + (index + 1) << PAGE_SHIFT); + if (!page) + return; + wait_var_event(page, dax_page_idle(page)); + } while (true); +} + +/* + * Delete DAX entry at @index from @mapping. Wait for it + * to be unlocked before deleting it. + */ +int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index) +{ + int ret; + + if (mapping_exiting(mapping)) + dax_break_layout(mapping, index); + + ret = __dax_invalidate_entry(mapping, index, true); + + /* + * This gets called from truncate / punch_hole path. As such, the caller + * must hold locks protecting against concurrent modifications of the + * page cache (usually fs-private i_mmap_sem for writing). Since the + * caller has seen a DAX entry for this index, we better find it + * at that index as well... + */ + WARN_ON_ONCE(!ret); + return ret; +} + +/* + * Invalidate DAX entry if it is clean. + */ +int dax_invalidate_mapping_entry_sync(struct address_space *mapping, + pgoff_t index) +{ + return __dax_invalidate_entry(mapping, index, false); +} + +/* + * By this point grab_mapping_entry() has ensured that we have a locked entry + * of the appropriate size so we don't have to worry about downgrading PMDs to + * PTEs. If we happen to be trying to insert a PTE and there is a PMD + * already in the tree, we will skip the insertion and just dirty the PMD as + * appropriate. + */ +vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, + void **pentry, pfn_t pfn, unsigned long flags) +{ + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + void *new_entry = dax_make_entry(pfn, flags); + bool dirty = flags & DAX_DIRTY; + bool cow = flags & DAX_COW; + void *entry = *pentry; + vm_fault_t ret = 0; + + if (dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { + unsigned long index = xas->xa_index; + /* we are replacing a zero page with block mapping */ + if (dax_is_pmd_entry(entry)) + unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, + PG_PMD_NR, false); + else /* pte entry */ + unmap_mapping_pages(mapping, index, 1, false); + } + + xas_reset(xas); + xas_lock_irq(xas); + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { + void *old; + + dax_disassociate_entry(entry, mapping, false); + ret = dax_associate_entry(new_entry, mapping, vmf, flags); + if (ret) + goto out; + /* + * Only swap our new entry into the page cache if the current + * entry is a zero page or an empty entry. If a normal PTE or + * PMD entry is already in the cache, we leave it alone. This + * means that if we are trying to insert a PTE and the + * existing entry is a PMD, we will just leave the PMD in the + * tree and dirty it if necessary. + */ + old = dax_lock_entry(xas, new_entry); + WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) | + DAX_LOCKED)); + entry = new_entry; + } else { + xas_load(xas); /* Walk the xa_state */ + } + + if (dirty) + xas_set_mark(xas, PAGECACHE_TAG_DIRTY); + + if (cow) + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); + + *pentry = entry; +out: + xas_unlock_irq(xas); + + return ret; +} + +int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, + struct address_space *mapping, void *entry) +{ + unsigned long pfn, index, count, end; + long ret = 0; + struct vm_area_struct *vma; + + /* + * A page got tagged dirty in DAX mapping? Something is seriously + * wrong. + */ + if (WARN_ON(!xa_is_value(entry))) + return -EIO; + + if (unlikely(dax_is_locked(entry))) { + void *old_entry = entry; + + entry = get_unlocked_entry(xas, 0); + + /* Entry got punched out / reallocated? */ + if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) + goto put_unlocked; + /* + * Entry got reallocated elsewhere? No need to writeback. + * We have to compare pfns as we must not bail out due to + * difference in lockbit or entry type. + */ + if (dax_to_pfn(old_entry) != dax_to_pfn(entry)) + goto put_unlocked; + if (WARN_ON_ONCE(dax_is_empty_entry(entry) || + dax_is_zero_entry(entry))) { + ret = -EIO; + goto put_unlocked; + } + + /* Another fsync thread may have already done this entry */ + if (!xas_get_mark(xas, PAGECACHE_TAG_TOWRITE)) + goto put_unlocked; + } + + /* Lock the entry to serialize with page faults */ + dax_lock_entry(xas, entry); + + /* + * We can clear the tag now but we have to be careful so that concurrent + * dax_writeback_one() calls for the same index cannot finish before we + * actually flush the caches. This is achieved as the calls will look + * at the entry only under the i_pages lock and once they do that + * they will see the entry locked and wait for it to unlock. + */ + xas_clear_mark(xas, PAGECACHE_TAG_TOWRITE); + xas_unlock_irq(xas); + + /* + * If dax_writeback_mapping_range() was given a wbc->range_start + * in the middle of a PMD, the 'index' we use needs to be + * aligned to the start of the PMD. + * This allows us to flush for PMD_SIZE and not have to worry about + * partial PMD writebacks. + */ + pfn = dax_to_pfn(entry); + count = 1UL << dax_entry_order(entry); + index = xas->xa_index & ~(count - 1); + end = index + count - 1; + + /* Walk all mappings of a given index of a file and writeprotect them */ + i_mmap_lock_read(mapping); + vma_interval_tree_foreach(vma, &mapping->i_mmap, index, end) { + pfn_mkclean_range(pfn, count, index, vma); + cond_resched(); + } + i_mmap_unlock_read(mapping); + + dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE); + /* + * After we have flushed the cache, we can clear the dirty tag. There + * cannot be new dirty data in the pfn after the flush has completed as + * the pfn mappings are writeprotected and fault waits for mapping + * entry lock. + */ + xas_reset(xas); + xas_lock_irq(xas); + xas_store(xas, entry); + xas_clear_mark(xas, PAGECACHE_TAG_DIRTY); + dax_wake_entry(xas, entry, WAKE_NEXT); + + trace_dax_writeback_one(mapping->host, index, count); + return ret; + + put_unlocked: + put_unlocked_entry(xas, entry, WAKE_NEXT); + return ret; +} + +/* + * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables + * @vmf: The description of the fault + * @pfn: PFN to insert + * @order: Order of entry to insert. + * + * This function inserts a writeable PTE or PMD entry into the page tables + * for an mmaped DAX file. It also marks the page cache entry as dirty. + */ +vm_fault_t dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, + unsigned int order) +{ + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, order); + void *entry; + vm_fault_t ret; + + xas_lock_irq(&xas); + entry = get_unlocked_entry(&xas, order); + /* Did we race with someone splitting entry or so? */ + if (!entry || dax_is_conflict(entry) || + (order == 0 && !dax_is_pte_entry(entry))) { + put_unlocked_entry(&xas, entry, WAKE_NEXT); + xas_unlock_irq(&xas); + trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf, + VM_FAULT_NOPAGE); + return VM_FAULT_NOPAGE; + } + xas_set_mark(&xas, PAGECACHE_TAG_DIRTY); + dax_lock_entry(&xas, entry); + xas_unlock_irq(&xas); + if (order == 0) + ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); +#ifdef CONFIG_FS_DAX_PMD + else if (order == PMD_ORDER) + ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); +#endif + else + ret = VM_FAULT_FALLBACK; + dax_unlock_entry(&xas, entry); + trace_dax_insert_pfn_mkwrite(mapping->host, vmf, ret); + return ret; +} diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 41342e47662d..423543358fd2 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -564,6 +564,8 @@ static int __init dax_core_init(void) if (rc) return rc; + dax_mapping_init(); + rc = alloc_chrdev_region(&dax_devt, 0, MINORMASK+1, "dax"); if (rc) goto err_chrdev; @@ -590,5 +592,5 @@ static void __exit dax_core_exit(void) MODULE_AUTHOR("Intel Corporation"); MODULE_LICENSE("GPL v2"); -subsys_initcall(dax_core_init); +fs_initcall(dax_core_init); module_exit(dax_core_exit); diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig index 027acca1bac4..24bdc87a4b99 100644 --- a/drivers/nvdimm/Kconfig +++ b/drivers/nvdimm/Kconfig @@ -78,6 +78,7 @@ config NVDIMM_DAX bool "NVDIMM DAX: Raw access to persistent memory" default LIBNVDIMM depends on NVDIMM_PFN + depends on DAX help Support raw device dax access to a persistent memory namespace. For environments that want to hard partition diff --git a/fs/dax.c b/fs/dax.c index 48bc43c0c03c..de79dd132e22 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -27,876 +27,8 @@ #include #include -#define CREATE_TRACE_POINTS #include -static inline unsigned int pe_order(enum page_entry_size pe_size) -{ - if (pe_size == PE_SIZE_PTE) - return PAGE_SHIFT - PAGE_SHIFT; - if (pe_size == PE_SIZE_PMD) - return PMD_SHIFT - PAGE_SHIFT; - if (pe_size == PE_SIZE_PUD) - return PUD_SHIFT - PAGE_SHIFT; - return ~0; -} - -/* We choose 4096 entries - same as per-zone page wait tables */ -#define DAX_WAIT_TABLE_BITS 12 -#define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS) - -/* The 'colour' (ie low bits) within a PMD of a page offset. */ -#define PG_PMD_COLOUR ((PMD_SIZE >> PAGE_SHIFT) - 1) -#define PG_PMD_NR (PMD_SIZE >> PAGE_SHIFT) - -/* The order of a PMD entry */ -#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) - -static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES]; - -static int __init init_dax_wait_table(void) -{ - int i; - - for (i = 0; i < DAX_WAIT_TABLE_ENTRIES; i++) - init_waitqueue_head(wait_table + i); - return 0; -} -fs_initcall(init_dax_wait_table); - -/* - * DAX pagecache entries use XArray value entries so they can't be mistaken - * for pages. We use one bit for locking, one bit for the entry size (PMD) - * and two more to tell us if the entry is a zero page or an empty entry that - * is just used for locking. In total four special bits. - * - * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE - * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem - * block allocation. - */ -#define DAX_SHIFT (5) -#define DAX_MASK ((1UL << DAX_SHIFT) - 1) -#define DAX_LOCKED (1UL << 0) -#define DAX_PMD (1UL << 1) -#define DAX_ZERO_PAGE (1UL << 2) -#define DAX_EMPTY (1UL << 3) -#define DAX_ZAP (1UL << 4) - -/* - * These flags are not conveyed in Xarray value entries, they are just - * modifiers to dax_insert_entry(). - */ -#define DAX_DIRTY (1UL << (DAX_SHIFT + 0)) -#define DAX_COW (1UL << (DAX_SHIFT + 1)) - -static unsigned long dax_to_pfn(void *entry) -{ - return xa_to_value(entry) >> DAX_SHIFT; -} - -static void *dax_make_entry(pfn_t pfn, unsigned long flags) -{ - return xa_mk_value((flags & DAX_MASK) | - (pfn_t_to_pfn(pfn) << DAX_SHIFT)); -} - -static bool dax_is_locked(void *entry) -{ - return xa_to_value(entry) & DAX_LOCKED; -} - -static bool dax_is_zapped(void *entry) -{ - return xa_to_value(entry) & DAX_ZAP; -} - -static unsigned int dax_entry_order(void *entry) -{ - if (xa_to_value(entry) & DAX_PMD) - return PMD_ORDER; - return 0; -} - -static unsigned long dax_is_pmd_entry(void *entry) -{ - return xa_to_value(entry) & DAX_PMD; -} - -static bool dax_is_pte_entry(void *entry) -{ - return !(xa_to_value(entry) & DAX_PMD); -} - -static int dax_is_zero_entry(void *entry) -{ - return xa_to_value(entry) & DAX_ZERO_PAGE; -} - -static int dax_is_empty_entry(void *entry) -{ - return xa_to_value(entry) & DAX_EMPTY; -} - -/* - * true if the entry that was found is of a smaller order than the entry - * we were looking for - */ -static bool dax_is_conflict(void *entry) -{ - return entry == XA_RETRY_ENTRY; -} - -/* - * DAX page cache entry locking - */ -struct exceptional_entry_key { - struct xarray *xa; - pgoff_t entry_start; -}; - -struct wait_exceptional_entry_queue { - wait_queue_entry_t wait; - struct exceptional_entry_key key; -}; - -/** - * enum dax_wake_mode: waitqueue wakeup behaviour - * @WAKE_ALL: wake all waiters in the waitqueue - * @WAKE_NEXT: wake only the first waiter in the waitqueue - */ -enum dax_wake_mode { - WAKE_ALL, - WAKE_NEXT, -}; - -static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas, - void *entry, struct exceptional_entry_key *key) -{ - unsigned long hash; - unsigned long index = xas->xa_index; - - /* - * If 'entry' is a PMD, align the 'index' that we use for the wait - * queue to the start of that PMD. This ensures that all offsets in - * the range covered by the PMD map to the same bit lock. - */ - if (dax_is_pmd_entry(entry)) - index &= ~PG_PMD_COLOUR; - key->xa = xas->xa; - key->entry_start = index; - - hash = hash_long((unsigned long)xas->xa ^ index, DAX_WAIT_TABLE_BITS); - return wait_table + hash; -} - -static int wake_exceptional_entry_func(wait_queue_entry_t *wait, - unsigned int mode, int sync, void *keyp) -{ - struct exceptional_entry_key *key = keyp; - struct wait_exceptional_entry_queue *ewait = - container_of(wait, struct wait_exceptional_entry_queue, wait); - - if (key->xa != ewait->key.xa || - key->entry_start != ewait->key.entry_start) - return 0; - return autoremove_wake_function(wait, mode, sync, NULL); -} - -/* - * @entry may no longer be the entry at the index in the mapping. - * The important information it's conveying is whether the entry at - * this index used to be a PMD entry. - */ -static void dax_wake_entry(struct xa_state *xas, void *entry, - enum dax_wake_mode mode) -{ - struct exceptional_entry_key key; - wait_queue_head_t *wq; - - wq = dax_entry_waitqueue(xas, entry, &key); - - /* - * Checking for locked entry and prepare_to_wait_exclusive() happens - * under the i_pages lock, ditto for entry handling in our callers. - * So at this point all tasks that could have seen our entry locked - * must be in the waitqueue and the following check will see them. - */ - if (waitqueue_active(wq)) - __wake_up(wq, TASK_NORMAL, mode == WAKE_ALL ? 0 : 1, &key); -} - -/* - * Look up entry in page cache, wait for it to become unlocked if it - * is a DAX entry and return it. The caller must subsequently call - * put_unlocked_entry() if it did not lock the entry or dax_unlock_entry() - * if it did. The entry returned may have a larger order than @order. - * If @order is larger than the order of the entry found in i_pages, this - * function returns a dax_is_conflict entry. - * - * Must be called with the i_pages lock held. - */ -static void *get_unlocked_entry(struct xa_state *xas, unsigned int order) -{ - void *entry; - struct wait_exceptional_entry_queue ewait; - wait_queue_head_t *wq; - - init_wait(&ewait.wait); - ewait.wait.func = wake_exceptional_entry_func; - - for (;;) { - entry = xas_find_conflict(xas); - if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) - return entry; - if (dax_entry_order(entry) < order) - return XA_RETRY_ENTRY; - if (!dax_is_locked(entry)) - return entry; - - wq = dax_entry_waitqueue(xas, entry, &ewait.key); - prepare_to_wait_exclusive(wq, &ewait.wait, - TASK_UNINTERRUPTIBLE); - xas_unlock_irq(xas); - xas_reset(xas); - schedule(); - finish_wait(wq, &ewait.wait); - xas_lock_irq(xas); - } -} - -/* - * The only thing keeping the address space around is the i_pages lock - * (it's cycled in clear_inode() after removing the entries from i_pages) - * After we call xas_unlock_irq(), we cannot touch xas->xa. - */ -static void wait_entry_unlocked(struct xa_state *xas, void *entry) -{ - struct wait_exceptional_entry_queue ewait; - wait_queue_head_t *wq; - - init_wait(&ewait.wait); - ewait.wait.func = wake_exceptional_entry_func; - - wq = dax_entry_waitqueue(xas, entry, &ewait.key); - /* - * Unlike get_unlocked_entry() there is no guarantee that this - * path ever successfully retrieves an unlocked entry before an - * inode dies. Perform a non-exclusive wait in case this path - * never successfully performs its own wake up. - */ - prepare_to_wait(wq, &ewait.wait, TASK_UNINTERRUPTIBLE); - xas_unlock_irq(xas); - schedule(); - finish_wait(wq, &ewait.wait); -} - -static void put_unlocked_entry(struct xa_state *xas, void *entry, - enum dax_wake_mode mode) -{ - if (entry && !dax_is_conflict(entry)) - dax_wake_entry(xas, entry, mode); -} - -/* - * We used the xa_state to get the entry, but then we locked the entry and - * dropped the xa_lock, so we know the xa_state is stale and must be reset - * before use. - */ -static void dax_unlock_entry(struct xa_state *xas, void *entry) -{ - void *old; - - BUG_ON(dax_is_locked(entry)); - xas_reset(xas); - xas_lock_irq(xas); - old = xas_store(xas, entry); - xas_unlock_irq(xas); - BUG_ON(!dax_is_locked(old)); - dax_wake_entry(xas, entry, WAKE_NEXT); -} - -/* - * Return: The entry stored at this location before it was locked. - */ -static void *dax_lock_entry(struct xa_state *xas, void *entry) -{ - unsigned long v = xa_to_value(entry); - return xas_store(xas, xa_mk_value(v | DAX_LOCKED)); -} - -static unsigned long dax_entry_size(void *entry) -{ - if (dax_is_zero_entry(entry)) - return 0; - else if (dax_is_empty_entry(entry)) - return 0; - else if (dax_is_pmd_entry(entry)) - return PMD_SIZE; - else - return PAGE_SIZE; -} - -/* - * Until fsdax constructs compound folios it needs to be prepared to - * support multiple folios per entry where each folio is a single page - */ -static struct folio *dax_entry_to_folio(void *entry, int idx) -{ - unsigned long pfn, size = dax_entry_size(entry); - struct page *page; - struct folio *folio; - - if (!size) - return NULL; - - pfn = dax_to_pfn(entry); - page = pfn_to_page(pfn); - folio = page_folio(page); - - /* - * Are there multiple folios per entry, and has the iterator - * passed the end of that set? - */ - if (idx >= size / folio_size(folio)) - return NULL; - - VM_WARN_ON_ONCE(!IS_ALIGNED(size, folio_size(folio))); - - return page_folio(page + idx); -} - -/* - * Iterate through all folios associated with a given entry - */ -#define dax_for_each_folio(entry, folio, i) \ - for (i = 0, folio = dax_entry_to_folio(entry, i); folio; \ - folio = dax_entry_to_folio(entry, ++i)) - -static inline bool dax_mapping_is_cow(struct address_space *mapping) -{ - return (unsigned long)mapping == PAGE_MAPPING_DAX_COW; -} - -/* - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount. - */ -static inline void dax_mapping_set_cow(struct folio *folio) -{ - if ((uintptr_t)folio->mapping != PAGE_MAPPING_DAX_COW) { - /* - * Reset the index if the folio was already mapped - * regularly before. - */ - if (folio->mapping) - folio->index = 1; - folio->mapping = (void *)PAGE_MAPPING_DAX_COW; - } - folio->index++; -} - -static struct dev_pagemap *folio_pgmap(struct folio *folio) -{ - return folio_page(folio, 0)->pgmap; -} - -/* - * When it is called in dax_insert_entry(), the cow flag will indicate that - * whether this entry is shared by multiple files. If so, set the page->mapping - * FS_DAX_MAPPING_COW, and use page->index as refcount. - */ -static vm_fault_t dax_associate_entry(void *entry, - struct address_space *mapping, - struct vm_fault *vmf, unsigned long flags) -{ - unsigned long size = dax_entry_size(entry), index; - struct folio *folio; - int i; - - if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) - return 0; - - index = linear_page_index(vmf->vma, ALIGN(vmf->address, size)); - dax_for_each_folio(entry, folio, i) - if (flags & DAX_COW) { - dax_mapping_set_cow(folio); - } else { - WARN_ON_ONCE(folio->mapping); - if (!pgmap_request_folios(folio_pgmap(folio), folio, 1)) - return VM_FAULT_SIGBUS; - folio->mapping = mapping; - folio->index = index + i; - } - - return 0; -} - -static void dax_disassociate_entry(void *entry, struct address_space *mapping, - bool trunc) -{ - struct folio *folio; - int i; - - if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) - return; - - dax_for_each_folio(entry, folio, i) { - if (dax_mapping_is_cow(folio->mapping)) { - /* keep the CoW flag if this folio is still shared */ - if (folio->index-- > 0) - continue; - } else { - WARN_ON_ONCE(trunc && !dax_is_zapped(entry)); - WARN_ON_ONCE(trunc && !dax_folio_idle(folio)); - WARN_ON_ONCE(folio->mapping && folio->mapping != mapping); - } - folio->mapping = NULL; - folio->index = 0; - } -} - -/* - * dax_lock_page - Lock the DAX entry corresponding to a page - * @page: The page whose entry we want to lock - * - * Context: Process context. - * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could - * not be locked. - */ -dax_entry_t dax_lock_page(struct page *page) -{ - XA_STATE(xas, NULL, 0); - void *entry; - - /* Ensure page->mapping isn't freed while we look at it */ - rcu_read_lock(); - for (;;) { - struct address_space *mapping = READ_ONCE(page->mapping); - - entry = NULL; - if (!mapping || !dax_mapping(mapping)) - break; - - /* - * In the device-dax case there's no need to lock, a - * struct dev_pagemap pin is sufficient to keep the - * inode alive, and we assume we have dev_pagemap pin - * otherwise we would not have a valid pfn_to_page() - * translation. - */ - entry = (void *)~0UL; - if (S_ISCHR(mapping->host->i_mode)) - break; - - xas.xa = &mapping->i_pages; - xas_lock_irq(&xas); - if (mapping != page->mapping) { - xas_unlock_irq(&xas); - continue; - } - xas_set(&xas, page->index); - entry = xas_load(&xas); - if (dax_is_locked(entry)) { - rcu_read_unlock(); - wait_entry_unlocked(&xas, entry); - rcu_read_lock(); - continue; - } - dax_lock_entry(&xas, entry); - xas_unlock_irq(&xas); - break; - } - rcu_read_unlock(); - return (dax_entry_t)entry; -} - -void dax_unlock_page(struct page *page, dax_entry_t cookie) -{ - struct address_space *mapping = page->mapping; - XA_STATE(xas, &mapping->i_pages, page->index); - - if (S_ISCHR(mapping->host->i_mode)) - return; - - dax_unlock_entry(&xas, (void *)cookie); -} - -/* - * dax_lock_mapping_entry - Lock the DAX entry corresponding to a mapping - * @mapping: the file's mapping whose entry we want to lock - * @index: the offset within this file - * @page: output the dax page corresponding to this dax entry - * - * Return: A cookie to pass to dax_unlock_mapping_entry() or 0 if the entry - * could not be locked. - */ -dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, pgoff_t index, - struct page **page) -{ - XA_STATE(xas, NULL, 0); - void *entry; - - rcu_read_lock(); - for (;;) { - entry = NULL; - if (!dax_mapping(mapping)) - break; - - xas.xa = &mapping->i_pages; - xas_lock_irq(&xas); - xas_set(&xas, index); - entry = xas_load(&xas); - if (dax_is_locked(entry)) { - rcu_read_unlock(); - wait_entry_unlocked(&xas, entry); - rcu_read_lock(); - continue; - } - if (!entry || - dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { - /* - * Because we are looking for entry from file's mapping - * and index, so the entry may not be inserted for now, - * or even a zero/empty entry. We don't think this is - * an error case. So, return a special value and do - * not output @page. - */ - entry = (void *)~0UL; - } else { - *page = pfn_to_page(dax_to_pfn(entry)); - dax_lock_entry(&xas, entry); - } - xas_unlock_irq(&xas); - break; - } - rcu_read_unlock(); - return (dax_entry_t)entry; -} - -void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index, - dax_entry_t cookie) -{ - XA_STATE(xas, &mapping->i_pages, index); - - if (cookie == ~0UL) - return; - - dax_unlock_entry(&xas, (void *)cookie); -} - -/* - * Find page cache entry at given index. If it is a DAX entry, return it - * with the entry locked. If the page cache doesn't contain an entry at - * that index, add a locked empty entry. - * - * When requesting an entry with size DAX_PMD, grab_mapping_entry() will - * either return that locked entry or will return VM_FAULT_FALLBACK. - * This will happen if there are any PTE entries within the PMD range - * that we are requesting. - * - * We always favor PTE entries over PMD entries. There isn't a flow where we - * evict PTE entries in order to 'upgrade' them to a PMD entry. A PMD - * insertion will fail if it finds any PTE entries already in the tree, and a - * PTE insertion will cause an existing PMD entry to be unmapped and - * downgraded to PTE entries. This happens for both PMD zero pages as - * well as PMD empty entries. - * - * The exception to this downgrade path is for PMD entries that have - * real storage backing them. We will leave these real PMD entries in - * the tree, and PTE writes will simply dirty the entire PMD entry. - * - * Note: Unlike filemap_fault() we don't honor FAULT_FLAG_RETRY flags. For - * persistent memory the benefit is doubtful. We can add that later if we can - * show it helps. - * - * On error, this function does not return an ERR_PTR. Instead it returns - * a VM_FAULT code, encoded as an xarray internal entry. The ERR_PTR values - * overlap with xarray value entries. - */ -static void *grab_mapping_entry(struct xa_state *xas, - struct address_space *mapping, unsigned int order) -{ - unsigned long index = xas->xa_index; - bool pmd_downgrade; /* splitting PMD entry into PTE entries? */ - void *entry; - -retry: - pmd_downgrade = false; - xas_lock_irq(xas); - entry = get_unlocked_entry(xas, order); - - if (entry) { - if (dax_is_conflict(entry)) - goto fallback; - if (!xa_is_value(entry)) { - xas_set_err(xas, -EIO); - goto out_unlock; - } - - if (order == 0) { - if (dax_is_pmd_entry(entry) && - (dax_is_zero_entry(entry) || - dax_is_empty_entry(entry))) { - pmd_downgrade = true; - } - } - } - - if (pmd_downgrade) { - /* - * Make sure 'entry' remains valid while we drop - * the i_pages lock. - */ - dax_lock_entry(xas, entry); - - /* - * Besides huge zero pages the only other thing that gets - * downgraded are empty entries which don't need to be - * unmapped. - */ - if (dax_is_zero_entry(entry)) { - xas_unlock_irq(xas); - unmap_mapping_pages(mapping, - xas->xa_index & ~PG_PMD_COLOUR, - PG_PMD_NR, false); - xas_reset(xas); - xas_lock_irq(xas); - } - - dax_disassociate_entry(entry, mapping, false); - xas_store(xas, NULL); /* undo the PMD join */ - dax_wake_entry(xas, entry, WAKE_ALL); - mapping->nrpages -= PG_PMD_NR; - entry = NULL; - xas_set(xas, index); - } - - if (entry) { - dax_lock_entry(xas, entry); - } else { - unsigned long flags = DAX_EMPTY; - - if (order > 0) - flags |= DAX_PMD; - entry = dax_make_entry(pfn_to_pfn_t(0), flags); - dax_lock_entry(xas, entry); - if (xas_error(xas)) - goto out_unlock; - mapping->nrpages += 1UL << order; - } - -out_unlock: - xas_unlock_irq(xas); - if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) - goto retry; - if (xas->xa_node == XA_ERROR(-ENOMEM)) - return xa_mk_internal(VM_FAULT_OOM); - if (xas_error(xas)) - return xa_mk_internal(VM_FAULT_SIGBUS); - return entry; -fallback: - xas_unlock_irq(xas); - return xa_mk_internal(VM_FAULT_FALLBACK); -} - -static void *dax_zap_entry(struct xa_state *xas, void *entry) -{ - unsigned long v = xa_to_value(entry); - - return xas_store(xas, xa_mk_value(v | DAX_ZAP)); -} - -/** - * Return NULL if the entry is zapped and all pages in the entry are - * idle, otherwise return the non-idle page in the entry - */ -static struct page *dax_zap_pages(struct xa_state *xas, void *entry) -{ - struct page *ret = NULL; - struct folio *folio; - bool zap; - int i; - - if (!dax_entry_size(entry)) - return NULL; - - zap = !dax_is_zapped(entry); - - dax_for_each_folio(entry, folio, i) { - if (zap) - pgmap_release_folios(folio_pgmap(folio), folio, 1); - if (!ret && !dax_folio_idle(folio)) - ret = folio_page(folio, 0); - } - - if (zap) - dax_zap_entry(xas, entry); - - return ret; -} - -/** - * dax_zap_mappings_range - find first pinned page in @mapping - * @mapping: address space to scan for a page with ref count > 1 - * @start: Starting offset. Page containing 'start' is included. - * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX, - * pages from 'start' till the end of file are included. - * - * DAX requires ZONE_DEVICE mapped pages. These pages are never - * 'onlined' to the page allocator so they are considered idle when - * page->count == 1. A filesystem uses this interface to determine if - * any page in the mapping is busy, i.e. for DMA, or other - * get_user_pages() usages. - * - * It is expected that the filesystem is holding locks to block the - * establishment of new mappings in this address_space. I.e. it expects - * to be able to run unmap_mapping_range() and subsequently not race - * mapping_mapped() becoming true. - */ -struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, - loff_t end) -{ - void *entry; - unsigned int scanned = 0; - struct page *page = NULL; - pgoff_t start_idx = start >> PAGE_SHIFT; - pgoff_t end_idx; - XA_STATE(xas, &mapping->i_pages, start_idx); - - /* - * In the 'limited' case get_user_pages() for dax is disabled. - */ - if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) - return NULL; - - if (!dax_mapping(mapping)) - return NULL; - - /* If end == LLONG_MAX, all pages from start to till end of file */ - if (end == LLONG_MAX) - end_idx = ULONG_MAX; - else - end_idx = end >> PAGE_SHIFT; - /* - * If we race get_user_pages_fast() here either we'll see the - * elevated page count in the iteration and wait, or - * get_user_pages_fast() will see that the page it took a reference - * against is no longer mapped in the page tables and bail to the - * get_user_pages() slow path. The slow path is protected by - * pte_lock() and pmd_lock(). New references are not taken without - * holding those locks, and unmap_mapping_pages() will not zero the - * pte or pmd without holding the respective lock, so we are - * guaranteed to either see new references or prevent new - * references from being established. - */ - unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0); - - xas_lock_irq(&xas); - xas_for_each(&xas, entry, end_idx) { - if (WARN_ON_ONCE(!xa_is_value(entry))) - continue; - if (unlikely(dax_is_locked(entry))) - entry = get_unlocked_entry(&xas, 0); - if (entry) - page = dax_zap_pages(&xas, entry); - put_unlocked_entry(&xas, entry, WAKE_NEXT); - if (page) - break; - if (++scanned % XA_CHECK_SCHED) - continue; - - xas_pause(&xas); - xas_unlock_irq(&xas); - cond_resched(); - xas_lock_irq(&xas); - } - xas_unlock_irq(&xas); - return page; -} -EXPORT_SYMBOL_GPL(dax_zap_mappings_range); - -struct page *dax_zap_mappings(struct address_space *mapping) -{ - return dax_zap_mappings_range(mapping, 0, LLONG_MAX); -} -EXPORT_SYMBOL_GPL(dax_zap_mappings); - -static int __dax_invalidate_entry(struct address_space *mapping, - pgoff_t index, bool trunc) -{ - XA_STATE(xas, &mapping->i_pages, index); - int ret = 0; - void *entry; - - xas_lock_irq(&xas); - entry = get_unlocked_entry(&xas, 0); - if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) - goto out; - if (!trunc && - (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY) || - xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE))) - goto out; - dax_disassociate_entry(entry, mapping, trunc); - xas_store(&xas, NULL); - mapping->nrpages -= 1UL << dax_entry_order(entry); - ret = 1; -out: - put_unlocked_entry(&xas, entry, WAKE_ALL); - xas_unlock_irq(&xas); - return ret; -} - -/* - * wait indefinitely for all pins to drop, the alternative to waiting is - * a potential use-after-free scenario - */ -static void dax_break_layout(struct address_space *mapping, pgoff_t index) -{ - /* To do this without locks, the inode needs to be unreferenced */ - WARN_ON(atomic_read(&mapping->host->i_count)); - do { - struct page *page; - - page = dax_zap_mappings_range(mapping, index << PAGE_SHIFT, - (index + 1) << PAGE_SHIFT); - if (!page) - return; - wait_var_event(page, dax_page_idle(page)); - } while (true); -} - -/* - * Delete DAX entry at @index from @mapping. Wait for it - * to be unlocked before deleting it. - */ -int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index) -{ - int ret; - - if (mapping_exiting(mapping)) - dax_break_layout(mapping, index); - - ret = __dax_invalidate_entry(mapping, index, true); - - /* - * This gets called from truncate / punch_hole path. As such, the caller - * must hold locks protecting against concurrent modifications of the - * page cache (usually fs-private i_mmap_sem for writing). Since the - * caller has seen a DAX entry for this index, we better find it - * at that index as well... - */ - WARN_ON_ONCE(!ret); - return ret; -} - -/* - * Invalidate DAX entry if it is clean. - */ -int dax_invalidate_mapping_entry_sync(struct address_space *mapping, - pgoff_t index) -{ - return __dax_invalidate_entry(mapping, index, false); -} - static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos) { return PHYS_PFN(iomap->addr + (pos & PAGE_MASK) - iomap->offset); @@ -923,200 +55,6 @@ static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter *iter return 0; } -/* - * MAP_SYNC on a dax mapping guarantees dirty metadata is - * flushed on write-faults (non-cow), but not read-faults. - */ -static bool dax_fault_is_synchronous(const struct iomap_iter *iter, - struct vm_area_struct *vma) -{ - return (iter->flags & IOMAP_WRITE) && (vma->vm_flags & VM_SYNC) && - (iter->iomap.flags & IOMAP_F_DIRTY); -} - -static bool dax_fault_is_cow(const struct iomap_iter *iter) -{ - return (iter->flags & IOMAP_WRITE) && - (iter->iomap.flags & IOMAP_F_SHARED); -} - -static unsigned long dax_iter_flags(const struct iomap_iter *iter, - struct vm_fault *vmf) -{ - unsigned long flags = 0; - - if (!dax_fault_is_synchronous(iter, vmf->vma)) - flags |= DAX_DIRTY; - - if (dax_fault_is_cow(iter)) - flags |= DAX_COW; - - return flags; -} - -/* - * By this point grab_mapping_entry() has ensured that we have a locked entry - * of the appropriate size so we don't have to worry about downgrading PMDs to - * PTEs. If we happen to be trying to insert a PTE and there is a PMD - * already in the tree, we will skip the insertion and just dirty the PMD as - * appropriate. - */ -static vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, - void **pentry, pfn_t pfn, - unsigned long flags) -{ - struct address_space *mapping = vmf->vma->vm_file->f_mapping; - void *new_entry = dax_make_entry(pfn, flags); - bool dirty = flags & DAX_DIRTY; - bool cow = flags & DAX_COW; - void *entry = *pentry; - vm_fault_t ret = 0; - - if (dirty) - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { - unsigned long index = xas->xa_index; - /* we are replacing a zero page with block mapping */ - if (dax_is_pmd_entry(entry)) - unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, - PG_PMD_NR, false); - else /* pte entry */ - unmap_mapping_pages(mapping, index, 1, false); - } - - xas_reset(xas); - xas_lock_irq(xas); - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { - void *old; - - dax_disassociate_entry(entry, mapping, false); - ret = dax_associate_entry(new_entry, mapping, vmf, flags); - if (ret) - goto out; - /* - * Only swap our new entry into the page cache if the current - * entry is a zero page or an empty entry. If a normal PTE or - * PMD entry is already in the cache, we leave it alone. This - * means that if we are trying to insert a PTE and the - * existing entry is a PMD, we will just leave the PMD in the - * tree and dirty it if necessary. - */ - old = dax_lock_entry(xas, new_entry); - WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) | - DAX_LOCKED)); - entry = new_entry; - } else { - xas_load(xas); /* Walk the xa_state */ - } - - if (dirty) - xas_set_mark(xas, PAGECACHE_TAG_DIRTY); - - if (cow) - xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); - - *pentry = entry; -out: - xas_unlock_irq(xas); - - return ret; -} - -static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, - struct address_space *mapping, void *entry) -{ - unsigned long pfn, index, count, end; - long ret = 0; - struct vm_area_struct *vma; - - /* - * A page got tagged dirty in DAX mapping? Something is seriously - * wrong. - */ - if (WARN_ON(!xa_is_value(entry))) - return -EIO; - - if (unlikely(dax_is_locked(entry))) { - void *old_entry = entry; - - entry = get_unlocked_entry(xas, 0); - - /* Entry got punched out / reallocated? */ - if (!entry || WARN_ON_ONCE(!xa_is_value(entry))) - goto put_unlocked; - /* - * Entry got reallocated elsewhere? No need to writeback. - * We have to compare pfns as we must not bail out due to - * difference in lockbit or entry type. - */ - if (dax_to_pfn(old_entry) != dax_to_pfn(entry)) - goto put_unlocked; - if (WARN_ON_ONCE(dax_is_empty_entry(entry) || - dax_is_zero_entry(entry))) { - ret = -EIO; - goto put_unlocked; - } - - /* Another fsync thread may have already done this entry */ - if (!xas_get_mark(xas, PAGECACHE_TAG_TOWRITE)) - goto put_unlocked; - } - - /* Lock the entry to serialize with page faults */ - dax_lock_entry(xas, entry); - - /* - * We can clear the tag now but we have to be careful so that concurrent - * dax_writeback_one() calls for the same index cannot finish before we - * actually flush the caches. This is achieved as the calls will look - * at the entry only under the i_pages lock and once they do that - * they will see the entry locked and wait for it to unlock. - */ - xas_clear_mark(xas, PAGECACHE_TAG_TOWRITE); - xas_unlock_irq(xas); - - /* - * If dax_writeback_mapping_range() was given a wbc->range_start - * in the middle of a PMD, the 'index' we use needs to be - * aligned to the start of the PMD. - * This allows us to flush for PMD_SIZE and not have to worry about - * partial PMD writebacks. - */ - pfn = dax_to_pfn(entry); - count = 1UL << dax_entry_order(entry); - index = xas->xa_index & ~(count - 1); - end = index + count - 1; - - /* Walk all mappings of a given index of a file and writeprotect them */ - i_mmap_lock_read(mapping); - vma_interval_tree_foreach(vma, &mapping->i_mmap, index, end) { - pfn_mkclean_range(pfn, count, index, vma); - cond_resched(); - } - i_mmap_unlock_read(mapping); - - dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE); - /* - * After we have flushed the cache, we can clear the dirty tag. There - * cannot be new dirty data in the pfn after the flush has completed as - * the pfn mappings are writeprotected and fault waits for mapping - * entry lock. - */ - xas_reset(xas); - xas_lock_irq(xas); - xas_store(xas, entry); - xas_clear_mark(xas, PAGECACHE_TAG_DIRTY); - dax_wake_entry(xas, entry, WAKE_NEXT); - - trace_dax_writeback_one(mapping->host, index, count); - return ret; - - put_unlocked: - put_unlocked_entry(xas, entry, WAKE_NEXT); - return ret; -} - /* * Flush the mapping to the persistent domain within the byte range of [start, * end]. This is required by data integrity operations to ensure file data is @@ -1251,6 +189,37 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, return 0; } +/* + * MAP_SYNC on a dax mapping guarantees dirty metadata is + * flushed on write-faults (non-cow), but not read-faults. + */ +static bool dax_fault_is_synchronous(const struct iomap_iter *iter, + struct vm_area_struct *vma) +{ + return (iter->flags & IOMAP_WRITE) && (vma->vm_flags & VM_SYNC) && + (iter->iomap.flags & IOMAP_F_DIRTY); +} + +static bool dax_fault_is_cow(const struct iomap_iter *iter) +{ + return (iter->flags & IOMAP_WRITE) && + (iter->iomap.flags & IOMAP_F_SHARED); +} + +static unsigned long dax_iter_flags(const struct iomap_iter *iter, + struct vm_fault *vmf) +{ + unsigned long flags = 0; + + if (!dax_fault_is_synchronous(iter, vmf->vma)) + flags |= DAX_DIRTY; + + if (dax_fault_is_cow(iter)) + flags |= DAX_COW; + + return flags; +} + /* * The user has performed a load from a hole in the file. Allocating a new * page in the file would cause excessive storage usage for workloads with @@ -1737,7 +706,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page) iter.flags |= IOMAP_WRITE; - entry = grab_mapping_entry(&xas, mapping, 0); + entry = dax_grab_mapping_entry(&xas, mapping, 0); if (xa_is_internal(entry)) { ret = xa_to_internal(entry); goto out; @@ -1854,12 +823,12 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, goto fallback; /* - * grab_mapping_entry() will make sure we get an empty PMD entry, + * dax_grab_mapping_entry() will make sure we get an empty PMD entry, * a zero PMD entry or a DAX PMD. If it can't (because a PTE * entry is already in the array, for instance), it will return * VM_FAULT_FALLBACK. */ - entry = grab_mapping_entry(&xas, mapping, PMD_ORDER); + entry = dax_grab_mapping_entry(&xas, mapping, PMD_ORDER); if (xa_is_internal(entry)) { ret = xa_to_internal(entry); goto fallback; @@ -1933,50 +902,6 @@ vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, } EXPORT_SYMBOL_GPL(dax_iomap_fault); -/* - * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables - * @vmf: The description of the fault - * @pfn: PFN to insert - * @order: Order of entry to insert. - * - * This function inserts a writeable PTE or PMD entry into the page tables - * for an mmaped DAX file. It also marks the page cache entry as dirty. - */ -static vm_fault_t -dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order) -{ - struct address_space *mapping = vmf->vma->vm_file->f_mapping; - XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, order); - void *entry; - vm_fault_t ret; - - xas_lock_irq(&xas); - entry = get_unlocked_entry(&xas, order); - /* Did we race with someone splitting entry or so? */ - if (!entry || dax_is_conflict(entry) || - (order == 0 && !dax_is_pte_entry(entry))) { - put_unlocked_entry(&xas, entry, WAKE_NEXT); - xas_unlock_irq(&xas); - trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf, - VM_FAULT_NOPAGE); - return VM_FAULT_NOPAGE; - } - xas_set_mark(&xas, PAGECACHE_TAG_DIRTY); - dax_lock_entry(&xas, entry); - xas_unlock_irq(&xas); - if (order == 0) - ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); -#ifdef CONFIG_FS_DAX_PMD - else if (order == PMD_ORDER) - ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); -#endif - else - ret = VM_FAULT_FALLBACK; - dax_unlock_entry(&xas, entry); - trace_dax_insert_pfn_mkwrite(mapping->host, vmf, ret); - return ret; -} - /** * dax_finish_sync_fault - finish synchronous page fault * @vmf: The description of the fault diff --git a/include/linux/dax.h b/include/linux/dax.h index 12e15ca11bff..1fc3d79b6aec 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -157,15 +157,34 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder) int dax_writeback_mapping_range(struct address_space *mapping, struct dax_device *dax_dev, struct writeback_control *wbc); -struct page *dax_zap_mappings(struct address_space *mapping); -struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, - loff_t end); +#else +static inline int dax_writeback_mapping_range(struct address_space *mapping, + struct dax_device *dax_dev, struct writeback_control *wbc) +{ + return -EOPNOTSUPP; +} + +#endif + +int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, + const struct iomap_ops *ops); +int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, + const struct iomap_ops *ops); + +#if IS_ENABLED(CONFIG_DAX) +int dax_read_lock(void); +void dax_read_unlock(int id); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); +void run_dax(struct dax_device *dax_dev); dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, unsigned long index, struct page **page); void dax_unlock_mapping_entry(struct address_space *mapping, unsigned long index, dax_entry_t cookie); +void dax_break_layouts(struct inode *inode); +struct page *dax_zap_mappings(struct address_space *mapping); +struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, + loff_t end); #else static inline struct page *dax_zap_mappings(struct address_space *mapping) { @@ -179,12 +198,6 @@ static inline struct page *dax_zap_mappings_range(struct address_space *mapping, return NULL; } -static inline int dax_writeback_mapping_range(struct address_space *mapping, - struct dax_device *dax_dev, struct writeback_control *wbc) -{ - return -EOPNOTSUPP; -} - static inline dax_entry_t dax_lock_page(struct page *page) { if (IS_DAX(page->mapping->host)) @@ -196,6 +209,15 @@ static inline void dax_unlock_page(struct page *page, dax_entry_t cookie) { } +static inline int dax_read_lock(void) +{ + return 0; +} + +static inline void dax_read_unlock(int id) +{ +} + static inline dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, unsigned long index, struct page **page) { @@ -208,11 +230,6 @@ static inline void dax_unlock_mapping_entry(struct address_space *mapping, } #endif -int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, - const struct iomap_ops *ops); -int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); - /* * Document all the code locations that want know when a dax page is * unreferenced. @@ -227,19 +244,6 @@ static inline bool dax_folio_idle(struct folio *folio) return dax_page_idle(folio_page(folio, 0)); } -#if IS_ENABLED(CONFIG_DAX) -int dax_read_lock(void); -void dax_read_unlock(int id); -#else -static inline int dax_read_lock(void) -{ - return 0; -} - -static inline void dax_read_unlock(int id) -{ -} -#endif /* CONFIG_DAX */ bool dax_alive(struct dax_device *dax_dev); void *dax_get_private(struct dax_device *dax_dev); long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, @@ -260,6 +264,9 @@ vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); +void *dax_grab_mapping_entry(struct xa_state *xas, + struct address_space *mapping, unsigned int order); +void dax_unlock_entry(struct xa_state *xas, void *entry); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); @@ -276,6 +283,57 @@ static inline bool dax_mapping(struct address_space *mapping) return mapping->host && IS_DAX(mapping->host); } +/* + * DAX pagecache entries use XArray value entries so they can't be mistaken + * for pages. We use one bit for locking, one bit for the entry size (PMD) + * and two more to tell us if the entry is a zero page or an empty entry that + * is just used for locking. In total four special bits. + * + * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE + * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem + * block allocation. + */ +#define DAX_SHIFT (5) +#define DAX_MASK ((1UL << DAX_SHIFT) - 1) +#define DAX_LOCKED (1UL << 0) +#define DAX_PMD (1UL << 1) +#define DAX_ZERO_PAGE (1UL << 2) +#define DAX_EMPTY (1UL << 3) +#define DAX_ZAP (1UL << 4) + +/* + * These flags are not conveyed in Xarray value entries, they are just + * modifiers to dax_insert_entry(). + */ +#define DAX_DIRTY (1UL << (DAX_SHIFT + 0)) +#define DAX_COW (1UL << (DAX_SHIFT + 1)) + +vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, + void **pentry, pfn_t pfn, unsigned long flags); +vm_fault_t dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, + unsigned int order); +int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, + struct address_space *mapping, void *entry); + +#ifdef CONFIG_MMU +/* The 'colour' (ie low bits) within a PMD of a page offset. */ +#define PG_PMD_COLOUR ((PMD_SIZE >> PAGE_SHIFT) - 1) +#define PG_PMD_NR (PMD_SIZE >> PAGE_SHIFT) + +/* The order of a PMD entry */ +#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) + +static inline unsigned int pe_order(enum page_entry_size pe_size) +{ + if (pe_size == PE_SIZE_PTE) + return PAGE_SHIFT - PAGE_SHIFT; + if (pe_size == PE_SIZE_PMD) + return PMD_SHIFT - PAGE_SHIFT; + if (pe_size == PE_SIZE_PUD) + return PUD_SHIFT - PAGE_SHIFT; + return ~0; +} + #ifdef CONFIG_DEV_DAX_HMEM_DEVICES void hmem_register_device(int target_nid, struct resource *r); #else @@ -283,5 +341,6 @@ static inline void hmem_register_device(int target_nid, struct resource *r) { } #endif +#endif /* CONFIG_MMU */ #endif diff --git a/include/linux/memremap.h b/include/linux/memremap.h index b87c16577af1..98196b8d3172 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -221,6 +221,12 @@ static inline void devm_memunmap_pages(struct device *dev, { } +static inline struct dev_pagemap * +get_dev_pagemap_many(unsigned long pfn, struct dev_pagemap *pgmap, int refs) +{ + return NULL; +} + static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap) { From patchwork Fri Oct 14 23:58:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 486B9C4332F for ; Fri, 14 Oct 2022 23:59:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0DB06B008C; Fri, 14 Oct 2022 19:59:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CBE356B0092; Fri, 14 Oct 2022 19:59:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B83DF6B0093; Fri, 14 Oct 2022 19:59:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A68586B008C for ; Fri, 14 Oct 2022 19:59:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7B1B51202A4 for ; Fri, 14 Oct 2022 23:59:06 +0000 (UTC) X-FDA: 80021223492.08.B049BCF Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf15.hostedemail.com (Postfix) with ESMTP id C6C8CA0027 for ; Fri, 14 Oct 2022 23:59:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791945; x=1697327945; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=utXjDwHtVMO/W0wdDgPICi14CmGCX+3qR5RCZ+LxUbs=; b=ehmvPVGGIxKjst1QZp7Bm6K7HgniN8rnXun/radGnEzv2/2qMoq95QLi 6fa4c9xjAOJUkB5EQvh5Ph4F0XZpOQ+B07QqlHxBZEeYBesHwGito6MLE JQUwpLoDLK3J0kEh4dX1NNakK1FOmT+PWUa72rjXOK87YDW+Vrw7b7Dok qGS9viJ0crEMiq5a/VCYycTzyiAqbp8K8sddQnVoN69Ctme6jy0wck6kh UVfG4+sbOUCHSDSKMnQromCOUlgl6GB2TwL/7iye2IzgAhGGnINgQ5aq6 pjPVsVgt8kxVCyUGHor0wTXqlQkLeV4gI1IkoRZOTgMZP62hx1OIyMkqm w==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="288791438" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="288791438" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:38 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798941" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798941" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:38 -0700 Subject: [PATCH v3 17/25] devdax: Sparse fixes for xarray locking From: Dan Williams To: linux-mm@kvack.org Cc: kernel test robot , david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:38 -0700 Message-ID: <166579191803.2236710.11651241811946564050.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ehmvPVGG; spf=pass (imf15.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791946; a=rsa-sha256; cv=none; b=PUrhIs0BRmlQivrOKZESuI6rXTL9mGTMhYsxn7TphyIGqTcJH3bHyWjgD7wMIa/37aiNw5 gaKUp2MsWF+85XgEWseyFHXhoQVyqq6REFFRMxRGFUEnRRucvsOwTtB0m7En3SDoPumPYM +LOKQLkm5Wo9ZDCbCNmVNhRchX5MYAc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791946; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WKHxpYWw6Kh9khzrA0oHqDikhQlvjbtqNhVNH2uPI5s=; b=nZ6JYs2DJihQ/qpahkxaNJ5ibKOX0Ndu3j7pZ7q/4uAp/gJWko0vCja5K04i11Sb63DM3k gvhC028PPEhNR7ii5t5RnCXcySWNKAG0EPgkW8QOMLKdOkcbTN/R9pAKyJ42uHH1SS3A6T ZFSajMmnAJwhztvofeoRrPe6wjt+Jtc= X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ehmvPVGG; spf=pass (imf15.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C6C8CA0027 X-Stat-Signature: epcxutp3ru8j3e8ocqumzcc3wdptdcen X-HE-Tag: 1665791945-641937 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the dax-mapping-entry code has moved to a common location take the opportunity to fixup some long standing sparse warnings. In this case annotate the manipulations of the Xarray lock: Fixes: drivers/dax/mapping.c:216:13: sparse: warning: context imbalance in 'wait_entry_unlocked' - unexpected unlock drivers/dax/mapping.c:953:9: sparse: warning: context imbalance in 'dax_writeback_one' - unexpected unlock Reported-by: Reported-by: kernel test robot Link: http://lore.kernel.org/r/202210091141.cHaQEuCs-lkp@intel.com Signed-off-by: Dan Williams --- drivers/dax/mapping.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c index 19121b7421fb..803ae64c13d4 100644 --- a/drivers/dax/mapping.c +++ b/drivers/dax/mapping.c @@ -213,7 +213,7 @@ static void *get_unlocked_entry(struct xa_state *xas, unsigned int order) * (it's cycled in clear_inode() after removing the entries from i_pages) * After we call xas_unlock_irq(), we cannot touch xas->xa. */ -static void wait_entry_unlocked(struct xa_state *xas, void *entry) +static void wait_entry_unlocked(struct xa_state *xas, void *entry) __releases(xas) { struct wait_exceptional_entry_queue ewait; wait_queue_head_t *wq; @@ -910,7 +910,7 @@ vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, } int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, - struct address_space *mapping, void *entry) + struct address_space *mapping, void *entry) __must_hold(xas) { unsigned long pfn, index, count, end; long ret = 0; From patchwork Fri Oct 14 23:58:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E75BC433FE for ; Fri, 14 Oct 2022 23:58:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 062DF6B0080; Fri, 14 Oct 2022 19:58:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 012436B0082; Fri, 14 Oct 2022 19:58:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1CAA6B0087; Fri, 14 Oct 2022 19:58:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CF9CE6B0080 for ; Fri, 14 Oct 2022 19:58:45 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B08AD1C6598 for ; Fri, 14 Oct 2022 23:58:45 +0000 (UTC) X-FDA: 80021222610.04.358998C Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf04.hostedemail.com (Postfix) with ESMTP id 29A734002D for ; Fri, 14 Oct 2022 23:58:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791925; x=1697327925; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2hYftC5ZQ196yfpWz+XXw4uXMxFoArKPmI5qaP+piRE=; b=CyUTqzFlNoMEGebRbFZ30VH70G1jRHE3s7k6ylXl0Hv9xXiSDXO8lZYK 87jpC46ZeDkep+l8ErByq/DF2vuyQCYVxRdP5flRQ5RjaQdef7qPPN1iq DDMtEw2zW6WFjA2f4hdQlMIYvNJw288hZXkRE9Nhdg9xqdnlMbwCxg7XO mTk9TRKkzuKXvAza8pOMM73zfJIweoH0EPL7Yp4+r/Fdoncxw20FvVbwQ 0naF5QvM4KK3af6x5MWw3mCdaVA8a8RRoFdDaUtQPuyH8H55it1DOMGjO WuM0EMjAD91jCH6M3xz8FI7cxjsuaAHDOb2s/4O91bsqBfZ02RQhGoNSO w==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="307154711" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="307154711" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:44 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798951" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798951" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:43 -0700 Subject: [PATCH v3 18/25] devdax: Sparse fixes for vmfault_t / dax-entry conversions From: Dan Williams To: linux-mm@kvack.org Cc: kernel test robot , david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:43 -0700 Message-ID: <166579192360.2236710.14796211268184430654.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791925; a=rsa-sha256; cv=none; b=eTwRDfZXtDxzB1ac0Zs3WQigiU+xKR/04GFfEz0hIO91zVUSSm3sewoF6lMosql20sKCLU J24dRXLoJ1hQ8ydsa0PuiJl2YOexjweITrdD9Q2mmqfm+lpIoTvCoemLTSn9Zgs8PDws4m VPQ71RGKZBR+wGoI2A+yzkdc5iXHl2w= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=CyUTqzFl; spf=pass (imf04.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791925; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x8LCXdlbIt0iLwdN3RrfEU2GBQT/LEYJ3F8id9eEqMU=; b=hBLJGAIluGIhMafw0Z7QItlNlPpC77rmHKowW4ZR7aU83f9u72m14YYTCQ1qis3Ja7gT50 yrYJC1BOKIYD17/Jnvko18TyjX6Tmmb6K1wh61u5QH8qpx9MWpNwpLDb/i36PPSM+hX6Wh bhsGCRc83HqnS6kVbzl/3cvBA44/Jvc= X-Rspam-User: X-Stat-Signature: p1z8tteu6kgo1ab7sa4yfukss5kh8he4 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 29A734002D Authentication-Results: imf04.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=CyUTqzFl; spf=pass (imf04.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1665791924-720824 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the dax-mapping-entry code has moved to a common location take the opportunity to fixup some long standing sparse warnings. In this case conveying vm_fault_t codes through Xarray internal values was missing some forced casts. Add some helpers, is_dax_err(), dax_err_to_vmfault(), and vmfault_to_dax_err() to handle the conversions. Fixes: drivers/dax/mapping.c:637:39: sparse: warning: incorrect type in argument 1 (different base types) drivers/dax/mapping.c:637:39: sparse: expected unsigned long v drivers/dax/mapping.c:637:39: sparse: got restricted vm_fault_t drivers/dax/mapping.c:639:39: sparse: warning: incorrect type in argument 1 (different base types) drivers/dax/mapping.c:639:39: sparse: expected unsigned long v drivers/dax/mapping.c:639:39: sparse: got restricted vm_fault_t drivers/dax/mapping.c:643:31: sparse: warning: incorrect type in argument 1 (different base types) drivers/dax/mapping.c:643:31: sparse: expected unsigned long v drivers/dax/mapping.c:643:31: sparse: got restricted vm_fault_t Reported-by: Reported-by: kernel test robot Link: http://lore.kernel.org/r/202210091141.cHaQEuCs-lkp@intel.com Signed-off-by: Dan Williams --- drivers/dax/mapping.c | 6 +++--- fs/dax.c | 8 ++++---- include/linux/dax.h | 16 ++++++++++++++++ 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c index 803ae64c13d4..b452bfa98f5e 100644 --- a/drivers/dax/mapping.c +++ b/drivers/dax/mapping.c @@ -634,13 +634,13 @@ void *dax_grab_mapping_entry(struct xa_state *xas, if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) goto retry; if (xas->xa_node == XA_ERROR(-ENOMEM)) - return xa_mk_internal(VM_FAULT_OOM); + return vmfault_to_dax_err(VM_FAULT_OOM); if (xas_error(xas)) - return xa_mk_internal(VM_FAULT_SIGBUS); + return vmfault_to_dax_err(VM_FAULT_SIGBUS); return entry; fallback: xas_unlock_irq(xas); - return xa_mk_internal(VM_FAULT_FALLBACK); + return vmfault_to_dax_err(VM_FAULT_FALLBACK); } static void *dax_zap_entry(struct xa_state *xas, void *entry) diff --git a/fs/dax.c b/fs/dax.c index de79dd132e22..dc1dcbaeba05 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -707,8 +707,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, iter.flags |= IOMAP_WRITE; entry = dax_grab_mapping_entry(&xas, mapping, 0); - if (xa_is_internal(entry)) { - ret = xa_to_internal(entry); + if (is_dax_err(entry)) { + ret = dax_err_to_vmfault(entry); goto out; } @@ -829,8 +829,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * VM_FAULT_FALLBACK. */ entry = dax_grab_mapping_entry(&xas, mapping, PMD_ORDER); - if (xa_is_internal(entry)) { - ret = xa_to_internal(entry); + if (is_dax_err(entry)) { + ret = dax_err_to_vmfault(entry); goto fallback; } diff --git a/include/linux/dax.h b/include/linux/dax.h index 1fc3d79b6aec..553bc819a6a4 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -264,6 +264,22 @@ vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); + +static inline bool is_dax_err(void *entry) +{ + return xa_is_internal(entry); +} + +static inline vm_fault_t dax_err_to_vmfault(void *entry) +{ + return (vm_fault_t __force)(xa_to_internal(entry)); +} + +static inline void *vmfault_to_dax_err(vm_fault_t error) +{ + return xa_mk_internal((unsigned long __force)error); +} + void *dax_grab_mapping_entry(struct xa_state *xas, struct address_space *mapping, unsigned int order); void dax_unlock_entry(struct xa_state *xas, void *entry); From patchwork Fri Oct 14 23:58:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007532 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BEF4C433FE for ; Fri, 14 Oct 2022 23:58:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D3616B0082; Fri, 14 Oct 2022 19:58:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2837C6B0087; Fri, 14 Oct 2022 19:58:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 172B16B0088; Fri, 14 Oct 2022 19:58:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC1AC6B0082 for ; Fri, 14 Oct 2022 19:58:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C0D72160ADD for ; Fri, 14 Oct 2022 23:58:51 +0000 (UTC) X-FDA: 80021222862.19.69FCD41 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf20.hostedemail.com (Postfix) with ESMTP id 31B631C002F for ; Fri, 14 Oct 2022 23:58:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791931; x=1697327931; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Pn7FRJyA8KXjI8N8cdiRLRmRlb+h9sH/LZ3PlPzj0fI=; b=a845Xy0tmyNu+S0fhW2ZyK0gJDSKyLqs3gDMcb25s1zN9BPiB/UlU9c3 i/gI4BOliIHSf/zhM09rOLDourIr06fOKV+gKTfU1d9RHgv6K8+wCVgSO Qu10qECVA8mj7W/9p7Tni4C+5pFNgzfSZ/b+4ifffgCkWX+x0aOBeS68t 1tss65xuXnHdV5Jpu/wgdBjQRa7wJwVH1gGu/AX9VwX8CAGrnbmgD7U+N JuzBbmfzInJ6TjgHz/U7xjA3be4lqXzol8fXh4Ak5ibzSYiCSzxQsMIJm 5E36F5UPCgs4fIv0B276G7bBEjg7xVEvyOCeo4LnRn38uMoUrIsu38IxC g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="391802540" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="391802540" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:50 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798957" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798957" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:49 -0700 Subject: [PATCH v3 19/25] devdax: Sparse fixes for vm_fault_t in tracepoints From: Dan Williams To: linux-mm@kvack.org Cc: kernel test robot , david@fromorbit.com, hch@lst.de, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:49 -0700 Message-ID: <166579192919.2236710.12464252412504907962.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791931; a=rsa-sha256; cv=none; b=4bZxJhHt452gM2YKOlmocnMBqFTIPiZ302hRO9AjoaHwFK01SH7OVfcfdmh7TrIn1/sWPe EaB2AWRa5M/ZNbMLA/Ruh8ALCQyahvDIqk9N3/aWkpjQ4gj892yg6y46CZ+8eOyZp+iiz4 I9ys6FYTl7nQ4zhTXtjGsbLDEXyPjI0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a845Xy0t; spf=pass (imf20.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791931; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xqPf/Eb/8V/604SxGwa9vHERYQ6URyTv+lvpfz+rHDA=; b=Ojm2L9M6pyWCCu3S02y+BpXu/AK7rI4H2mWKRA6y+9fCEWSbsIkUhz6S5unDLB3bWeA+zS 00dIXhee6v7u++hUY68B4b8xWuBETuPKbyebIebOvyCK8ujlpIs2Z3KkBM3JcRPvdUt2Xh Q5YKLkwwRF3lXfepOBwpENLhq8wMG0A= X-Stat-Signature: b853u7c3si3di5yqsbr5jr18qmyfuqyi X-Rspamd-Queue-Id: 31B631C002F X-Rspam-User: X-Rspamd-Server: rspam08 Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a845Xy0t; spf=pass (imf20.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1665791930-769931 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the dax-mapping-entry code has moved to a common location take the opportunity to fixup some long standing sparse warnings. In this case the tracepoints have long specified the wrong type for the traced return code. Pass the correct type, but handle casting it back to 'unsigned int' inside the trace helpers as the helpers are not prepared to handle restricted types. Fixes: drivers/dax/mapping.c:1031:55: sparse: warning: incorrect type in argument 3 (different base types) drivers/dax/mapping.c:1031:55: sparse: expected int result drivers/dax/mapping.c:1031:55: sparse: got restricted vm_fault_t drivers/dax/mapping.c:1046:58: sparse: warning: incorrect type in argument 3 (different base types) drivers/dax/mapping.c:1046:58: sparse: expected int result drivers/dax/mapping.c:1046:58: sparse: got restricted vm_fault_t [assigned] [usertype] ret Reported-by: Reported-by: kernel test robot Link: http://lore.kernel.org/r/202210091141.cHaQEuCs-lkp@intel.com Signed-off-by: Dan Williams --- include/linux/mm_types.h | 26 +++++++++++++------------- include/trace/events/fs_dax.h | 16 ++++++++-------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 500e536796ca..910d880e67eb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -891,19 +891,19 @@ enum vm_fault_reason { VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK) #define VM_FAULT_RESULT_TRACE \ - { VM_FAULT_OOM, "OOM" }, \ - { VM_FAULT_SIGBUS, "SIGBUS" }, \ - { VM_FAULT_MAJOR, "MAJOR" }, \ - { VM_FAULT_WRITE, "WRITE" }, \ - { VM_FAULT_HWPOISON, "HWPOISON" }, \ - { VM_FAULT_HWPOISON_LARGE, "HWPOISON_LARGE" }, \ - { VM_FAULT_SIGSEGV, "SIGSEGV" }, \ - { VM_FAULT_NOPAGE, "NOPAGE" }, \ - { VM_FAULT_LOCKED, "LOCKED" }, \ - { VM_FAULT_RETRY, "RETRY" }, \ - { VM_FAULT_FALLBACK, "FALLBACK" }, \ - { VM_FAULT_DONE_COW, "DONE_COW" }, \ - { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" } + { (__force unsigned int) VM_FAULT_OOM, "OOM" }, \ + { (__force unsigned int) VM_FAULT_SIGBUS, "SIGBUS" }, \ + { (__force unsigned int) VM_FAULT_MAJOR, "MAJOR" }, \ + { (__force unsigned int) VM_FAULT_WRITE, "WRITE" }, \ + { (__force unsigned int) VM_FAULT_HWPOISON, "HWPOISON" }, \ + { (__force unsigned int) VM_FAULT_HWPOISON_LARGE, "HWPOISON_LARGE" }, \ + { (__force unsigned int) VM_FAULT_SIGSEGV, "SIGSEGV" }, \ + { (__force unsigned int) VM_FAULT_NOPAGE, "NOPAGE" }, \ + { (__force unsigned int) VM_FAULT_LOCKED, "LOCKED" }, \ + { (__force unsigned int) VM_FAULT_RETRY, "RETRY" }, \ + { (__force unsigned int) VM_FAULT_FALLBACK, "FALLBACK" }, \ + { (__force unsigned int) VM_FAULT_DONE_COW, "DONE_COW" }, \ + { (__force unsigned int) VM_FAULT_NEEDDSYNC, "NEEDDSYNC" } struct vm_special_mapping { const char *name; /* The name, e.g. "[vdso]". */ diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index 97b09fcf7e52..adc50cf7b969 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -9,7 +9,7 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class, TP_PROTO(struct inode *inode, struct vm_fault *vmf, - pgoff_t max_pgoff, int result), + pgoff_t max_pgoff, vm_fault_t result), TP_ARGS(inode, vmf, max_pgoff, result), TP_STRUCT__entry( __field(unsigned long, ino) @@ -21,7 +21,7 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class, __field(pgoff_t, max_pgoff) __field(dev_t, dev) __field(unsigned int, flags) - __field(int, result) + __field(unsigned int, result) ), TP_fast_assign( __entry->dev = inode->i_sb->s_dev; @@ -33,7 +33,7 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class, __entry->flags = vmf->flags; __entry->pgoff = vmf->pgoff; __entry->max_pgoff = max_pgoff; - __entry->result = result; + __entry->result = (__force unsigned int) result; ), TP_printk("dev %d:%d ino %#lx %s %s address %#lx vm_start " "%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s", @@ -54,7 +54,7 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class, #define DEFINE_PMD_FAULT_EVENT(name) \ DEFINE_EVENT(dax_pmd_fault_class, name, \ TP_PROTO(struct inode *inode, struct vm_fault *vmf, \ - pgoff_t max_pgoff, int result), \ + pgoff_t max_pgoff, vm_fault_t result), \ TP_ARGS(inode, vmf, max_pgoff, result)) DEFINE_PMD_FAULT_EVENT(dax_pmd_fault); @@ -151,7 +151,7 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \ DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping); DECLARE_EVENT_CLASS(dax_pte_fault_class, - TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result), + TP_PROTO(struct inode *inode, struct vm_fault *vmf, vm_fault_t result), TP_ARGS(inode, vmf, result), TP_STRUCT__entry( __field(unsigned long, ino) @@ -160,7 +160,7 @@ DECLARE_EVENT_CLASS(dax_pte_fault_class, __field(pgoff_t, pgoff) __field(dev_t, dev) __field(unsigned int, flags) - __field(int, result) + __field(unsigned int, result) ), TP_fast_assign( __entry->dev = inode->i_sb->s_dev; @@ -169,7 +169,7 @@ DECLARE_EVENT_CLASS(dax_pte_fault_class, __entry->address = vmf->address; __entry->flags = vmf->flags; __entry->pgoff = vmf->pgoff; - __entry->result = result; + __entry->result = (__force unsigned int) result; ), TP_printk("dev %d:%d ino %#lx %s %s address %#lx pgoff %#lx %s", MAJOR(__entry->dev), @@ -185,7 +185,7 @@ DECLARE_EVENT_CLASS(dax_pte_fault_class, #define DEFINE_PTE_FAULT_EVENT(name) \ DEFINE_EVENT(dax_pte_fault_class, name, \ - TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result), \ + TP_PROTO(struct inode *inode, struct vm_fault *vmf, vm_fault_t result), \ TP_ARGS(inode, vmf, result)) DEFINE_PTE_FAULT_EVENT(dax_pte_fault); From patchwork Fri Oct 14 23:58:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10998C433FE for ; Fri, 14 Oct 2022 23:58:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4F166B0088; Fri, 14 Oct 2022 19:58:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FD886B0089; Fri, 14 Oct 2022 19:58:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C5778E0002; Fri, 14 Oct 2022 19:58:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7B43A6B0088 for ; Fri, 14 Oct 2022 19:58:57 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3335BC0BC9 for ; Fri, 14 Oct 2022 23:58:57 +0000 (UTC) X-FDA: 80021223114.29.108DD88 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf23.hostedemail.com (Postfix) with ESMTP id AEA22140030 for ; Fri, 14 Oct 2022 23:58:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791936; x=1697327936; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ruMjl+0UocVm9fTpEQwbq3un5MC8KrI+3NHaQIaXRt0=; b=I668CiEOqjumguA2wt1/cuXx82yc6X4I9M29W3jmjq0ssrkJMJee/EAL U1jMg/UrvM9gJDSKltSnTceJKdDGZt+gfP0382jRWIz2gXXrTzqh5dRj5 j4zywgAwpgHEXS0YsgIzFZlG06bkKYeGtYOaag/Wp5Oo7fsmX1KAbLra6 d7xzl4SC2TehWN7yEcruvK26ZzFLUtP5x4N27+EDE8S1fTMHh59inQuCW AZphKW/vfTRz8y2EBw6tW76oULlK9MdRNoymiNrqQsCADqz/e34Nl/4Y+ sot91rBQ6eixwVvoCvgOMn+U5i1VwuldSmiGCPw2BL+Vs1qGpOMWWBEnd Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="307154730" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="307154730" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:55 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798964" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798964" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:58:55 -0700 Subject: [PATCH v3 20/25] devdax: add PUD support to the DAX mapping infrastructure From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:58:54 -0700 Message-ID: <166579193481.2236710.2902634991178133192.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=I668CiEO; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791936; a=rsa-sha256; cv=none; b=7r2VDPY9WJd1qoYMp3mHtcHOJVxrmOI6v4Bj3sLx0Ep34KVBXsSkxAmU2ac+Z0+N4lACS3 VksqBcGIfGQKeolSGj8/efHDaxKl0gFMlkjzqCR4LO5cNYIoJjmGKW6ifBthG4I4Pk0Jh4 qtp5LuQaJZH/A83YbULL1FjndrUxBOE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791936; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5CD6FQkctt3LEh3ojOSIzWNFiSGv9bA26b3O8gu/PKo=; b=Nntg1DWcNeacRhPEHZneiVnQ5Qx7fSV19h5PKEab2/D+TXUBB2dSRd04CrHRSOeuuFe/dd IsSwc+Z0T2PPBV1MHDJWCSX07OIH1VwhoHeUly27j4zwwxHjyRdKHQfFfJoYOoMZmHRH7y Fs3fVDbovU6+8dm6GpajUgq5RDqmltA= X-Rspam-User: X-Stat-Signature: pxe6x8skxuoityrc4g3w7qct76yf1k5p X-Rspamd-Queue-Id: AEA22140030 Authentication-Results: imf23.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=I668CiEO; spf=pass (imf23.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-HE-Tag: 1665791936-332233 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for using the DAX mapping infrastructure for device-dax, update the helpers to handle PUD entries. In practice the code related to @size_downgrade will go unused for PUD entries since only devdax creates DAX PUD entries and devdax enforces aligned mappings. The conversion is included for completeness. The addition of PUD support to the common dax_insert_pfn_mkwrite() requires a new stub for vmf_insert_pfn_pud() in the case where huge page support and/or PUD support is not available. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- drivers/dax/mapping.c | 50 ++++++++++++++++++++++++++++++++++++----------- include/linux/dax.h | 32 ++++++++++++++++++++---------- include/linux/huge_mm.h | 14 ++++++++++--- 3 files changed, 70 insertions(+), 26 deletions(-) diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c index b452bfa98f5e..ba01c1cf4b51 100644 --- a/drivers/dax/mapping.c +++ b/drivers/dax/mapping.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "dax-private.h" @@ -56,6 +57,8 @@ static bool dax_is_zapped(void *entry) static unsigned int dax_entry_order(void *entry) { + if (xa_to_value(entry) & DAX_PUD) + return PUD_ORDER; if (xa_to_value(entry) & DAX_PMD) return PMD_ORDER; return 0; @@ -66,9 +69,14 @@ static unsigned long dax_is_pmd_entry(void *entry) return xa_to_value(entry) & DAX_PMD; } +static unsigned long dax_is_pud_entry(void *entry) +{ + return xa_to_value(entry) & DAX_PUD; +} + static bool dax_is_pte_entry(void *entry) { - return !(xa_to_value(entry) & DAX_PMD); + return !(xa_to_value(entry) & (DAX_PMD|DAX_PUD)); } static int dax_is_zero_entry(void *entry) @@ -277,6 +285,8 @@ static unsigned long dax_entry_size(void *entry) return 0; else if (dax_is_pmd_entry(entry)) return PMD_SIZE; + else if (dax_is_pud_entry(entry)) + return PUD_SIZE; else return PAGE_SIZE; } @@ -561,11 +571,11 @@ void *dax_grab_mapping_entry(struct xa_state *xas, struct address_space *mapping, unsigned int order) { unsigned long index = xas->xa_index; - bool pmd_downgrade; /* splitting PMD entry into PTE entries? */ + bool size_downgrade; /* splitting entry into PTE entries? */ void *entry; retry: - pmd_downgrade = false; + size_downgrade = false; xas_lock_irq(xas); entry = get_unlocked_entry(xas, order); @@ -578,15 +588,25 @@ void *dax_grab_mapping_entry(struct xa_state *xas, } if (order == 0) { - if (dax_is_pmd_entry(entry) && + if (!dax_is_pte_entry(entry) && (dax_is_zero_entry(entry) || dax_is_empty_entry(entry))) { - pmd_downgrade = true; + size_downgrade = true; } } } - if (pmd_downgrade) { + if (size_downgrade) { + unsigned long colour, nr; + + if (dax_is_pmd_entry(entry)) { + colour = PG_PMD_COLOUR; + nr = PG_PMD_NR; + } else { + colour = PG_PUD_COLOUR; + nr = PG_PUD_NR; + } + /* * Make sure 'entry' remains valid while we drop * the i_pages lock. @@ -600,9 +620,8 @@ void *dax_grab_mapping_entry(struct xa_state *xas, */ if (dax_is_zero_entry(entry)) { xas_unlock_irq(xas); - unmap_mapping_pages(mapping, - xas->xa_index & ~PG_PMD_COLOUR, - PG_PMD_NR, false); + unmap_mapping_pages(mapping, xas->xa_index & ~colour, + nr, false); xas_reset(xas); xas_lock_irq(xas); } @@ -610,7 +629,7 @@ void *dax_grab_mapping_entry(struct xa_state *xas, dax_disassociate_entry(entry, mapping, false); xas_store(xas, NULL); /* undo the PMD join */ dax_wake_entry(xas, entry, WAKE_ALL); - mapping->nrpages -= PG_PMD_NR; + mapping->nrpages -= nr; entry = NULL; xas_set(xas, index); } @@ -620,7 +639,9 @@ void *dax_grab_mapping_entry(struct xa_state *xas, } else { unsigned long flags = DAX_EMPTY; - if (order > 0) + if (order == PUD_SHIFT - PAGE_SHIFT) + flags |= DAX_PUD; + else if (order == PMD_SHIFT - PAGE_SHIFT) flags |= DAX_PMD; entry = dax_make_entry(pfn_to_pfn_t(0), flags); dax_lock_entry(xas, entry); @@ -864,7 +885,10 @@ vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { unsigned long index = xas->xa_index; /* we are replacing a zero page with block mapping */ - if (dax_is_pmd_entry(entry)) + if (dax_is_pud_entry(entry)) + unmap_mapping_pages(mapping, index & ~PG_PUD_COLOUR, + PG_PUD_NR, false); + else if (dax_is_pmd_entry(entry)) unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, PG_PMD_NR, false); else /* pte entry */ @@ -1040,6 +1064,8 @@ vm_fault_t dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, else if (order == PMD_ORDER) ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); #endif + else if (order == PUD_ORDER) + ret = vmf_insert_pfn_pud(vmf, pfn, FAULT_FLAG_WRITE); else ret = VM_FAULT_FALLBACK; dax_unlock_entry(&xas, entry); diff --git a/include/linux/dax.h b/include/linux/dax.h index 553bc819a6a4..a61df43921a3 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -300,22 +300,25 @@ static inline bool dax_mapping(struct address_space *mapping) } /* - * DAX pagecache entries use XArray value entries so they can't be mistaken - * for pages. We use one bit for locking, one bit for the entry size (PMD) - * and two more to tell us if the entry is a zero page or an empty entry that - * is just used for locking. In total four special bits. + * DAX pagecache entries use XArray value entries so they can't be + * mistaken for pages. We use one bit for locking, two bits for the + * entry size (PMD, PUD) and two more to tell us if the entry is a zero + * page or an empty entry that is just used for locking. In total 5 + * special bits which limits the max pfn that can be stored as: + * (1UL << 57 - PAGE_SHIFT). 63 - DAX_SHIFT - 1 (for xa_mk_value()). * - * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE - * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem - * block allocation. + * If the P{M,U}D bits are not set the entry has size PAGE_SIZE, and if + * the ZERO_PAGE and EMPTY bits aren't set the entry is a normal DAX + * entry with a filesystem block allocation. */ -#define DAX_SHIFT (5) +#define DAX_SHIFT (6) #define DAX_MASK ((1UL << DAX_SHIFT) - 1) #define DAX_LOCKED (1UL << 0) #define DAX_PMD (1UL << 1) -#define DAX_ZERO_PAGE (1UL << 2) -#define DAX_EMPTY (1UL << 3) -#define DAX_ZAP (1UL << 4) +#define DAX_PUD (1UL << 2) +#define DAX_ZERO_PAGE (1UL << 3) +#define DAX_EMPTY (1UL << 4) +#define DAX_ZAP (1UL << 5) /* * These flags are not conveyed in Xarray value entries, they are just @@ -339,6 +342,13 @@ int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, /* The order of a PMD entry */ #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) +/* The 'colour' (ie low bits) within a PUD of a page offset. */ +#define PG_PUD_COLOUR ((PUD_SIZE >> PAGE_SHIFT) - 1) +#define PG_PUD_NR (PUD_SIZE >> PAGE_SHIFT) + +/* The order of a PUD entry */ +#define PUD_ORDER (PUD_SHIFT - PAGE_SHIFT) + static inline unsigned int pe_order(enum page_entry_size pe_size) { if (pe_size == PE_SIZE_PTE) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a1341fdcf666..aab708996fb0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -16,12 +16,22 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma); -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); +vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn, + pgprot_t pgprot, bool write); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { } + +static inline vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, + pfn_t pfn, pgprot_t pgprot, + bool write) +{ + return VM_FAULT_SIGBUS; +} #endif vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); @@ -58,8 +68,6 @@ static inline vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, { return vmf_insert_pfn_pmd_prot(vmf, pfn, vmf->vma->vm_page_prot, write); } -vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn, - pgprot_t pgprot, bool write); /** * vmf_insert_pfn_pud - insert a pud size pfn From patchwork Fri Oct 14 23:59:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007534 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1021AC4332F for ; Fri, 14 Oct 2022 23:59:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B23A6B0089; Fri, 14 Oct 2022 19:59:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 987E56B008A; Fri, 14 Oct 2022 19:59:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82B4D8E0002; Fri, 14 Oct 2022 19:59:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6D1F76B0089 for ; Fri, 14 Oct 2022 19:59:03 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2EA89A059C for ; Fri, 14 Oct 2022 23:59:03 +0000 (UTC) X-FDA: 80021223366.15.F1525CC Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf13.hostedemail.com (Postfix) with ESMTP id 894052002D for ; Fri, 14 Oct 2022 23:59:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791942; x=1697327942; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Hhzc7wFZ8F4D0mvPNxZZeth8IDrOfZkOP4UxlVQ4LJg=; b=QuHYDHJAYbg6Inmtt4er549/Qy1T/j8r6upoZFCXuFaazB8fp3QIm7Jw pNPRFG2+8+klOTcYZkW+wMTP7ytIztvY4TWzEpjt0VZP604B1oTnqa0lm Jw75La5lyY5dLg1ADyc46sRqolHspo1o7ps+myOTlA0pieJUDn7/fYXda lWudIpJAAGubMvYHH4PmpkWrB6uEfzV5lSpT6KudxSf6A9GQ/jsH1aQg4 yFntJfiTaL6WCgNKDyptDQ4lPiAz2sJazTrcgh3SFxSzeG81ATBJ9ifPz 70VNQO+M3EswO8xVd593RoBTYuU0qqrgNs046V0e9zqQj7R079QBMffyC g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292862172" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292862172" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:01 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="802798973" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="802798973" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:00 -0700 Subject: [PATCH v3 21/25] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:59:00 -0700 Message-ID: <166579194049.2236710.10922460534153863415.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791942; a=rsa-sha256; cv=none; b=4V9HSfMg1g6cCxvGpEqrnBzDDGtbv/KqqyY87jsfUVkYmyOV4fFui+Me0cBRyVDe0mEyij 7HMricTnlCkwqzcqwDr8MIT20HpDv0tSzNNqR8TRID1R/SeuHIc9CwWAqV7H/hd/u/01z9 SU07t7khIL+MrIwB3KSTZOHWcUDyn7I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=QuHYDHJA; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791942; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1n0ujk+PZ7bA95Sx8fzrsGJZAlELRsBUTJE2G+RYAjc=; b=mBwVgvhM+f+SrQWsnegObbVSIfDIMBkyPJ2Dm34pVajMWqcVS6QvepnEct7BMJNBK3C2sI Ob79bRR52A4hoDMRAvwVANFiyBc0LzSCjYW2tjxvRY9TZZI4ljpfg+Cb5QAgQXiHt//q8n gM6SfmOv6FojodiS6Er6wu6SS9aQII4= X-Stat-Signature: uwycoueduno1hi81npn85y8eox1upxa6 X-Rspamd-Queue-Id: 894052002D Authentication-Results: imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=QuHYDHJA; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1665791942-117739 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Track entries and take pgmap references at mapping insertion time. Revoke mappings (dax_zap_mappings()) and drop the associated pgmap references at device destruction or inode eviction time. With this in place, and the fsdax equivalent already in place, the gup code no longer needs to consider PTE_DEVMAP as an indicator to get a pgmap reference before taking a page reference. In other words, GUP takes additional references on mapped pages. Until now, DAX in all its forms was failing to take references at mapping time. With that fixed there is no longer a requirement for gup to manage @pgmap references. However, that cleanup is saved for a follow-on patch. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: John Hubbard Signed-off-by: Dan Williams --- drivers/dax/Kconfig | 1 + drivers/dax/bus.c | 9 +++++- drivers/dax/device.c | 73 +++++++++++++++++++++++++++++-------------------- drivers/dax/mapping.c | 19 +++++++++---- include/linux/dax.h | 3 +- 5 files changed, 68 insertions(+), 37 deletions(-) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index 205e9dda8928..2eddd32c51f4 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -9,6 +9,7 @@ if DAX config DEV_DAX tristate "Device DAX: direct access mapping device" depends on TRANSPARENT_HUGEPAGE + depends on !FS_DAX_LIMITED help Support raw access to differentiated (persistence, bandwidth, latency...) memory via an mmap(2) capable character diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1dad813ee4a6..f2a8b8c3776f 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -382,9 +382,16 @@ void kill_dev_dax(struct dev_dax *dev_dax) { struct dax_device *dax_dev = dev_dax->dax_dev; struct inode *inode = dax_inode(dax_dev); + struct address_space *mapping = inode->i_mapping; kill_dax(dax_dev); - unmap_mapping_range(inode->i_mapping, 0, 0, 1); + + /* + * The dax device inode can outlive the next reuse of the memory + * fronted by this device, force it idle now. + */ + dax_break_layouts(mapping, 0, ULONG_MAX >> PAGE_SHIFT); + truncate_inode_pages(mapping, 0); /* * Dynamic dax region have the pgmap allocated via dev_kzalloc() diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 5494d745ced5..022d4ba9c336 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -73,38 +73,15 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, return -1; } -static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn, - unsigned long fault_size) -{ - unsigned long i, nr_pages = fault_size / PAGE_SIZE; - struct file *filp = vmf->vma->vm_file; - struct dev_dax *dev_dax = filp->private_data; - pgoff_t pgoff; - - /* mapping is only set on the head */ - if (dev_dax->pgmap->vmemmap_shift) - nr_pages = 1; - - pgoff = linear_page_index(vmf->vma, - ALIGN(vmf->address, fault_size)); - - for (i = 0; i < nr_pages; i++) { - struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i); - - page = compound_head(page); - if (page->mapping) - continue; - - page->mapping = filp->f_mapping; - page->index = pgoff + i; - } -} - static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax, struct vm_fault *vmf) { + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + XA_STATE(xas, &mapping->i_pages, vmf->pgoff); struct device *dev = &dev_dax->dev; phys_addr_t phys; + vm_fault_t ret; + void *entry; pfn_t pfn; unsigned int fault_size = PAGE_SIZE; @@ -128,7 +105,16 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax, pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); - dax_set_mapping(vmf, pfn, fault_size); + entry = dax_grab_mapping_entry(&xas, mapping, 0); + if (is_dax_err(entry)) + return dax_err_to_vmfault(entry); + + ret = dax_insert_entry(&xas, vmf, &entry, pfn, 0); + + dax_unlock_entry(&xas, entry); + + if (ret) + return ret; return vmf_insert_mixed(vmf->vma, vmf->address, pfn); } @@ -136,10 +122,14 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax, static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, struct vm_fault *vmf) { + struct address_space *mapping = vmf->vma->vm_file->f_mapping; unsigned long pmd_addr = vmf->address & PMD_MASK; + XA_STATE(xas, &mapping->i_pages, vmf->pgoff); struct device *dev = &dev_dax->dev; phys_addr_t phys; + vm_fault_t ret; pgoff_t pgoff; + void *entry; pfn_t pfn; unsigned int fault_size = PMD_SIZE; @@ -171,7 +161,16 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); - dax_set_mapping(vmf, pfn, fault_size); + entry = dax_grab_mapping_entry(&xas, mapping, PMD_ORDER); + if (is_dax_err(entry)) + return dax_err_to_vmfault(entry); + + ret = dax_insert_entry(&xas, vmf, &entry, pfn, DAX_PMD); + + dax_unlock_entry(&xas, entry); + + if (ret) + return ret; return vmf_insert_pfn_pmd(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE); } @@ -180,10 +179,14 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, struct vm_fault *vmf) { + struct address_space *mapping = vmf->vma->vm_file->f_mapping; unsigned long pud_addr = vmf->address & PUD_MASK; + XA_STATE(xas, &mapping->i_pages, vmf->pgoff); struct device *dev = &dev_dax->dev; phys_addr_t phys; + vm_fault_t ret; pgoff_t pgoff; + void *entry; pfn_t pfn; unsigned int fault_size = PUD_SIZE; @@ -216,7 +219,16 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); - dax_set_mapping(vmf, pfn, fault_size); + entry = dax_grab_mapping_entry(&xas, mapping, PUD_ORDER); + if (xa_is_internal(entry)) + return xa_to_internal(entry); + + ret = dax_insert_entry(&xas, vmf, &entry, pfn, DAX_PUD); + + dax_unlock_entry(&xas, entry); + + if (ret) + return ret; return vmf_insert_pfn_pud(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE); } @@ -494,3 +506,4 @@ MODULE_LICENSE("GPL v2"); module_init(dax_init); module_exit(dax_exit); MODULE_ALIAS_DAX_DEVICE(0); +MODULE_IMPORT_NS(DAX); diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c index ba01c1cf4b51..07caaa23d476 100644 --- a/drivers/dax/mapping.c +++ b/drivers/dax/mapping.c @@ -266,6 +266,7 @@ void dax_unlock_entry(struct xa_state *xas, void *entry) WARN_ON(!dax_is_locked(old)); dax_wake_entry(xas, entry, WAKE_NEXT); } +EXPORT_SYMBOL_NS_GPL(dax_unlock_entry, DAX); /* * Return: The entry stored at this location before it was locked. @@ -663,6 +664,7 @@ void *dax_grab_mapping_entry(struct xa_state *xas, xas_unlock_irq(xas); return vmfault_to_dax_err(VM_FAULT_FALLBACK); } +EXPORT_SYMBOL_NS_GPL(dax_grab_mapping_entry, DAX); static void *dax_zap_entry(struct xa_state *xas, void *entry) { @@ -814,15 +816,21 @@ static int __dax_invalidate_entry(struct address_space *mapping, * wait indefinitely for all pins to drop, the alternative to waiting is * a potential use-after-free scenario */ -static void dax_break_layout(struct address_space *mapping, pgoff_t index) +void dax_break_layouts(struct address_space *mapping, pgoff_t index, + pgoff_t end) { - /* To do this without locks, the inode needs to be unreferenced */ - WARN_ON(atomic_read(&mapping->host->i_count)); + struct inode *inode = mapping->host; + + /* + * To do this without filesystem locks, the inode needs to be + * unreferenced, or device-dax. + */ + WARN_ON(atomic_read(&inode->i_count) && !S_ISCHR(inode->i_mode)); do { struct page *page; page = dax_zap_mappings_range(mapping, index << PAGE_SHIFT, - (index + 1) << PAGE_SHIFT); + end << PAGE_SHIFT); if (!page) return; wait_var_event(page, dax_page_idle(page)); @@ -838,7 +846,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index) int ret; if (mapping_exiting(mapping)) - dax_break_layout(mapping, index); + dax_break_layouts(mapping, index, index + 1); ret = __dax_invalidate_entry(mapping, index, true); @@ -932,6 +940,7 @@ vm_fault_t dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, return ret; } +EXPORT_SYMBOL_NS_GPL(dax_insert_entry, DAX); int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, struct address_space *mapping, void *entry) __must_hold(xas) diff --git a/include/linux/dax.h b/include/linux/dax.h index a61df43921a3..f2fbb5746ffa 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -181,7 +181,6 @@ dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, unsigned long index, struct page **page); void dax_unlock_mapping_entry(struct address_space *mapping, unsigned long index, dax_entry_t cookie); -void dax_break_layouts(struct inode *inode); struct page *dax_zap_mappings(struct address_space *mapping); struct page *dax_zap_mappings_range(struct address_space *mapping, loff_t start, loff_t end); @@ -286,6 +285,8 @@ void dax_unlock_entry(struct xa_state *xas, void *entry); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); +void dax_break_layouts(struct address_space *mapping, pgoff_t index, + pgoff_t end); int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, loff_t destoff, loff_t len, bool *is_same, From patchwork Fri Oct 14 23:59:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9479EC433FE for ; Fri, 14 Oct 2022 23:59:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D3436B0092; Fri, 14 Oct 2022 19:59:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 283666B0093; Fri, 14 Oct 2022 19:59:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14B6F6B0095; Fri, 14 Oct 2022 19:59:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 030756B0092 for ; Fri, 14 Oct 2022 19:59:13 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D862C160393 for ; Fri, 14 Oct 2022 23:59:12 +0000 (UTC) X-FDA: 80021223744.02.E4C8CB7 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf25.hostedemail.com (Postfix) with ESMTP id 4F030A0028 for ; Fri, 14 Oct 2022 23:59:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791952; x=1697327952; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xtNMf37ZvlWFUlBN5phg2oXwEnFZD9IVJ73UXHkz864=; b=B1KTkvejPuKexEusLh6EuB5DV4ieWC4uYMmZG5punvfI9w5NvS2T1Cgc K5H33BiqgF3HinZ7HB2hqtciBCsLJLUKkWzZowRVwlt9BvAffqrpbG742 iJHxrXCSPYK/KWWyjAR+leCT54rmMrphC8c6SG4utE/nHNhYLpSmFkeRN CatGnyabjPWcj+Ps7hIsAjQp+9B7dApHJZF/oyH+kXfflRX1fXmG4KGUt ycPaNuzA3Ou75SfJ958mQzGND3OjOgw41v+dNMp0JGDHf6CCP6S8wOHtC bOGF7GyVb3m/IijEazuNxnlmeJ/vAbdCqqvqJlc4XXHQpJfTxQ2bkvPxD Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292862183" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292862183" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:07 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="605541354" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="605541354" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:06 -0700 Subject: [PATCH v3 22/25] mm/memremap_pages: Replace zone_device_page_init() with pgmap_request_folios() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Alistair Popple , Jason Gunthorpe , Felix Kuehling , Alex Deucher , Christian =?utf-8?b?S8O2bmln?= , "Pan, Xinhui" , David Airlie , Daniel Vetter , Ben Skeggs , Karol Herbst , Lyude Paul , =?utf-8?b?SsOpcsO0bWU=?= Glisse , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:59:06 -0700 Message-ID: <166579194621.2236710.8168919102434295671.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791952; a=rsa-sha256; cv=none; b=RCVnMG0sypYiDQ67ioDMGKJQptg6uFnYmT5QgbcG41LjKHqHiAwOUBPUaSkhsPKCO6jbGl mPOGIAxHdVoiE8C5n+61d89yfs045xrSOGoV5Mynpu57oPhfnKoUKoaYxniwVrcT9LoYE7 oa4Cu1YFo1mZsCLw4SMdAIaphOLY3/I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=B1KTkvej; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KOPoQq9qRu/SLFW4lLa/uh/OSLzi/RXmW0heKCYTTaI=; b=Dtm+RQ4URvcqjyNEhqb5tsEL0wKNtBIWlnCHIC8Tu4XOVGi8hxkOnBpcalT055Y45T6Avp DC96XIC1q61QV/j0ifSU8TZJs14xuijlebQ6TVfLygarwxQey40PhYshoFkL/qfUXSEjaF t+obsEFd7c6A9DoUTg/LmzDbwXQvz9E= X-Stat-Signature: 5kyab1c86fjuo3r6tfa6x5ctsz5yjqha X-Rspamd-Queue-Id: 4F030A0028 X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=B1KTkvej; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam06 X-HE-Tag: 1665791952-552778 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Switch to the common method, shared across all MEMORY_DEVICE_* types, for requesting access to a ZONE_DEVICE page. The MEMORY_DEVICE_{PRIVATE,COHERENT} specific expectation that newly requested pages are locked is moved to the callers. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Cc: Alistair Popple Cc: Jason Gunthorpe Cc: Felix Kuehling Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Cc: Ben Skeggs Cc: Karol Herbst Cc: Lyude Paul Cc: "Jérôme Glisse" Signed-off-by: Dan Williams Reviewed-by: Lyude Paul --- arch/powerpc/kvm/book3s_hv_uvmem.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++- drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++- include/linux/memremap.h | 1 - lib/test_hmm.c | 3 ++- mm/memremap.c | 13 +------------ 6 files changed, 9 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index e2f11f9c3f2a..884ec112ad43 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -718,7 +718,8 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) dpage = pfn_to_page(uvmem_pfn); dpage->zone_device_data = pvt; - zone_device_page_init(dpage); + pgmap_request_folios(dpage->pgmap, page_folio(dpage), 1); + lock_page(dpage); return dpage; out_clear: spin_lock(&kvmppc_uvmem_bitmap_lock); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 97a684568ae0..8cf97060122b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -223,7 +223,8 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) page = pfn_to_page(pfn); svm_range_bo_ref(prange->svm_bo); page->zone_device_data = prange->svm_bo; - zone_device_page_init(page); + pgmap_request_folios(page->pgmap, page_folio(page), 1); + lock_page(page); } static void diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 5fe209107246..1482533c7ca0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -324,7 +324,8 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) return NULL; } - zone_device_page_init(page); + pgmap_request_folios(page->pgmap, page_folio(page), 1); + lock_page(page); return page; } diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 98196b8d3172..3fb3809d71f3 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -187,7 +187,6 @@ static inline bool folio_is_device_coherent(const struct folio *folio) } #ifdef CONFIG_ZONE_DEVICE -void zone_device_page_init(struct page *page); void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 67e6f83fe0f8..e4f7219ae3bb 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -632,7 +632,8 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) goto error; } - zone_device_page_init(dpage); + pgmap_request_folios(dpage->pgmap, page_folio(dpage), 1); + lock_page(dpage); dpage->zone_device_data = rpage; return dpage; diff --git a/mm/memremap.c b/mm/memremap.c index 87a649ecdc54..c46e700f5245 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -518,18 +518,6 @@ void free_zone_device_page(struct page *page) put_dev_pagemap(page->pgmap); } -void zone_device_page_init(struct page *page) -{ - /* - * Drivers shouldn't be allocating pages after calling - * memunmap_pages(). - */ - WARN_ON_ONCE(!percpu_ref_tryget_live(&page->pgmap->ref)); - set_page_count(page, 1); - lock_page(page); -} -EXPORT_SYMBOL_GPL(zone_device_page_init); - static bool folio_span_valid(struct dev_pagemap *pgmap, struct folio *folio, int nr_folios) { @@ -586,6 +574,7 @@ bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, return true; } +EXPORT_SYMBOL_GPL(pgmap_request_folios); void pgmap_release_folios(struct dev_pagemap *pgmap, struct folio *folio, int nr_folios) { From patchwork Fri Oct 14 23:59:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA330C43219 for ; Fri, 14 Oct 2022 23:59:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F2286B0093; Fri, 14 Oct 2022 19:59:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A33E6B0095; Fri, 14 Oct 2022 19:59:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 542BF6B0096; Fri, 14 Oct 2022 19:59:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 43D4E6B0093 for ; Fri, 14 Oct 2022 19:59:15 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 221AC16048C for ; Fri, 14 Oct 2022 23:59:15 +0000 (UTC) X-FDA: 80021223870.12.D831BE7 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf11.hostedemail.com (Postfix) with ESMTP id 456A940020 for ; Fri, 14 Oct 2022 23:59:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791954; x=1697327954; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zJkGyA2USCATPfeLAvbcj+LxFwO3lVwTXahzZIEoOwI=; b=m3WSE11xiXPWbbbcL7Gpi3RxOF3Use61ddE0ghjBysJJdyiIZ0ZYWM+S Op2TchIupdLcFaqyigE5kOW0HAGV4rSsl6/21gesLvycC3oA9u/wkIfde dia5UdCatM6wDo9KdzugSwb26Fdni+UfgnY5mNMNfrSLxWWYwChAXX8BG mdaqw5t5HQsAbZkSjt4w3iBAep92rSKvUciUa91mgbGzvRZopm9VnACaO dZGZPl7cA2laX4pd1/Ldj18Ylx+JkIlTMhVWojoVtebtUCAbnSnl6ZB8v LpslPEwqtd4kRRES4vuI6Mw5RmFnLmT1CcIAVF5dzi/h0C6cFO/J/RYry A==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="303112552" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="303112552" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:13 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="605541373" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="605541373" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:12 -0700 Subject: [PATCH v3 23/25] mm/memremap_pages: Initialize all ZONE_DEVICE pages to start at refcount 0 From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Alistair Popple , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:59:12 -0700 Message-ID: <166579195218.2236710.8731183545033177929.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=m3WSE11x; spf=pass (imf11.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791954; a=rsa-sha256; cv=none; b=kogt97fV+30Jn0Ctr81unX63HMhml06zwQuPPt6B/EdlwCX+dWiVNKiuxfDnwI5s/fzdsT YiTJCW9jLrdG1kHnICemvlL2tUKu0QStoqxhGXQWAfvqXGYvSX9ank1FAG7LjIyOI7ZixL HWLRI2Osa12o0oHnKk/yHwIoLjuxeoA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791954; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NnXhTYa8aytbi04mrulNACjS+UyDFXjU6oCQjrOepf0=; b=ifsduRzUtAc3e0JkvsZTUWMXhmL7gIhnJsBbqEsC633sLaOLoEDf/yz9IBWwLFL5WV4n5g q4h70WPFM36Qw63Yh/aZnUkjfJPIVdpRk9rUHtAroDwUBCCWobJAyQYAUXbEYxmvW4Bbjf en4q3xX57OCbvd1XvAQ+jl4RomFsu2Y= X-Rspam-User: X-Stat-Signature: tkjn33ohueijrmhe661rzsa8tkjhaquf X-Rspamd-Queue-Id: 456A940020 Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=m3WSE11x; spf=pass (imf11.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-HE-Tag: 1665791954-488110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The initial memremap_pages() implementation inherited the __init_single_page() default of pages starting life with an elevated reference count. This originally allowed for the page->pgmap pointer to alias with the storage for page->lru since a page was only allowed to be on an lru list when its reference count was zero. Since then, 'struct page' definition cleanups have arranged for dedicated space for the ZONE_DEVICE page metadata, the MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0 page->_refcount transition to route the page to free_zone_device_page() and not the core-mm page-free, and MEMORY_DEVICE_{PRIVATE,COHERENT} now arranges for its ZONE_DEVICE pages to start at _refcount 0. With those cleanups in place and with filesystem-dax and device-dax now converted to take and drop references at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX and MEMORY_DEVICE_GENERIC reference counts at 0 as well. This conversion also unifies all @pgmap accounting to be relative to pgmap_request_folio() and the paired folio_put() calls for those requested folios. This allows pgmap_release_folios() to be simplified to just a folio_put() helper. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Cc: Alistair Popple Cc: Jason Gunthorpe Signed-off-by: Dan Williams Reviewed-by: Alistair Popple --- drivers/dax/mapping.c | 2 +- include/linux/dax.h | 2 +- include/linux/memremap.h | 6 ++---- mm/memremap.c | 36 ++++++++++++++++-------------------- mm/page_alloc.c | 9 +-------- 5 files changed, 21 insertions(+), 34 deletions(-) diff --git a/drivers/dax/mapping.c b/drivers/dax/mapping.c index 07caaa23d476..ca06f2515644 100644 --- a/drivers/dax/mapping.c +++ b/drivers/dax/mapping.c @@ -691,7 +691,7 @@ static struct page *dax_zap_pages(struct xa_state *xas, void *entry) dax_for_each_folio(entry, folio, i) { if (zap) - pgmap_release_folios(folio_pgmap(folio), folio, 1); + pgmap_release_folios(folio, 1); if (!ret && !dax_folio_idle(folio)) ret = folio_page(folio, 0); } diff --git a/include/linux/dax.h b/include/linux/dax.h index f2fbb5746ffa..f4fc37933fc2 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -235,7 +235,7 @@ static inline void dax_unlock_mapping_entry(struct address_space *mapping, */ static inline bool dax_page_idle(struct page *page) { - return page_ref_count(page) == 1; + return page_ref_count(page) == 0; } static inline bool dax_folio_idle(struct folio *folio) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 3fb3809d71f3..ddb196ae0696 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -195,8 +195,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap); bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, int nr_folios); -void pgmap_release_folios(struct dev_pagemap *pgmap, struct folio *folio, - int nr_folios); +void pgmap_release_folios(struct folio *folio, int nr_folios); bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); @@ -238,8 +237,7 @@ static inline bool pgmap_request_folios(struct dev_pagemap *pgmap, return false; } -static inline void pgmap_release_folios(struct dev_pagemap *pgmap, - struct folio *folio, int nr_folios) +static inline void pgmap_release_folios(struct folio *folio, int nr_folios) { } diff --git a/mm/memremap.c b/mm/memremap.c index c46e700f5245..368ff41c560b 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -469,8 +469,10 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap); void free_zone_device_page(struct page *page) { - if (WARN_ON_ONCE(!page->pgmap->ops || !page->pgmap->ops->page_free)) - return; + struct dev_pagemap *pgmap = page->pgmap; + + /* wake filesystem 'break dax layouts' waiters */ + wake_up_var(page); mem_cgroup_uncharge(page_folio(page)); @@ -505,17 +507,9 @@ void free_zone_device_page(struct page *page) * to clear page->mapping. */ page->mapping = NULL; - page->pgmap->ops->page_free(page); - - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && - page->pgmap->type != MEMORY_DEVICE_COHERENT) - /* - * Reset the page count to 1 to prepare for handing out the page - * again. - */ - set_page_count(page, 1); - else - put_dev_pagemap(page->pgmap); + if (pgmap->ops && pgmap->ops->page_free) + pgmap->ops->page_free(page); + put_dev_pagemap(page->pgmap); } static bool folio_span_valid(struct dev_pagemap *pgmap, struct folio *folio, @@ -576,17 +570,19 @@ bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, } EXPORT_SYMBOL_GPL(pgmap_request_folios); -void pgmap_release_folios(struct dev_pagemap *pgmap, struct folio *folio, int nr_folios) +/* + * A symmetric helper to undo the page references acquired by + * pgmap_request_folios(), but the caller can also just arrange + * folio_put() on all the folios it acquired previously for the same + * effect. + */ +void pgmap_release_folios(struct folio *folio, int nr_folios) { struct folio *iter; int i; - for (iter = folio, i = 0; i < nr_folios; iter = folio_next(iter), i++) { - if (!put_devmap_managed_page(&iter->page)) - folio_put(iter); - if (!folio_ref_count(iter)) - put_dev_pagemap(pgmap); - } + for (iter = folio, i = 0; i < nr_folios; iter = folio_next(folio), i++) + folio_put(iter); } #ifdef CONFIG_FS_DAX diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8e9b7f08a32c..e35d1eb3308d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6787,6 +6787,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, { __init_single_page(page, pfn, zone_idx, nid); + set_page_count(page, 0); /* * Mark page reserved as it will need to wait for onlining @@ -6819,14 +6820,6 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, set_pageblock_migratetype(page, MIGRATE_MOVABLE); cond_resched(); } - - /* - * ZONE_DEVICE pages are released directly to the driver page allocator - * which will set the page count to 1 when allocating the page. - */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) - set_page_count(page, 0); } /* From patchwork Fri Oct 14 23:59:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5093C433FE for ; Fri, 14 Oct 2022 23:59:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5444E6B0088; Fri, 14 Oct 2022 19:59:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F3DD6B0089; Fri, 14 Oct 2022 19:59:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36E9F6B0095; Fri, 14 Oct 2022 19:59:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 25E866B0088 for ; Fri, 14 Oct 2022 19:59:22 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 04B4DC0BC9 for ; Fri, 14 Oct 2022 23:59:21 +0000 (UTC) X-FDA: 80021224164.24.182FF0F Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf25.hostedemail.com (Postfix) with ESMTP id 5A6DCA002C for ; Fri, 14 Oct 2022 23:59:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791961; x=1697327961; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=35GC+dPtkI8JDDm0wyE1ckiLPki/MRlOI+/f8wSKCbo=; b=SRbdeGR5SKNfb3IHE7rs4VjlcUmx5VETg/0wVerhEJC9Q7sq5z1DHLsf CbBpGkY0o76TF3FB8oCYb9KyxDP1iQ21ONd0IiLpaMTA3pe/d8TgxmncK 2HHa8GJhscpxmB3yuBWO8lbSE5U0O+Mx3nJPVwqvKP33/gzkbevCtEhLY 3weSnBXznNm5PO80JDMjlqYkJp2iZVIFuibojtp6eMlc4rAjG4p8EwA16 5jKL/aXm8UYT7jcjggDyAFrkfc7SZs6BztwVhMFR3whDRmi3i3flaFL1R jHAM/QEsy2daaYvwjcgA6mkTc13fworOwB2tbZOEPiAipdnRTK0Ky9a1k g==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="292862208" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="292862208" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:18 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="605541393" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="605541393" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:18 -0700 Subject: [PATCH v3 24/25] mm/meremap_pages: Delete put_devmap_managed_page_refs() From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Alistair Popple , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:59:17 -0700 Message-ID: <166579195789.2236710.7946318795534242314.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SRbdeGR5; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791961; a=rsa-sha256; cv=none; b=3JM2ABs1k0FdyNcnTiTtRAkFzoNhgcQsOg6yPVKw73s+tSDlsm2NBROMr0ACImk3MA6oWU Pw7GKQzt6uboSoTHIG22AYcFzMdLSg5SIj8Wk2g6dmzd65u8WXSx/l1RCoC4bVHsJmKpVO /evye6n7SmNdK5kdC57N++NA1BDoXp4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791961; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cU4eOD6MLCuou/7Qj3Q/RfaPVAx7u8AUElaY9eyGc2o=; b=rtjQtXjIPWpTirhv/XwDI1CNZ3kLFbLe23nG5Iffx+GikPg1Sjb5CTFeaZfNx5VrQytY54 ZRI/r/7B+nuIEgGxfIERY7SDbQOmZxUQ96rbVpihzSOsz97fcQko8b1yLrAGwpE2C+ONtd /BlYkK2bySLZC2uJP+FCOucSGpmRg/w= X-Rspamd-Queue-Id: 5A6DCA002C X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SRbdeGR5; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam04 X-Stat-Signature: fadgnz4tj45d7usxfm8tci45rkezs3oo X-HE-Tag: 1665791961-961267 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that fsdax DMA-idle detection no longer depends on catching transitions of page->_refcount to 1, and all users of pgmap pages get access to them via pgmap_request_folios(), remove put_devmap_managed_page_refs() and associated infrastructure. This includes the pgmap references taken at the beginning of time for each page because those @pgmap references are now arbitrated via pgmap_request_folios(). Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Cc: Alistair Popple Cc: Jason Gunthorpe Signed-off-by: Dan Williams Reviewed-by: Alistair Popple --- include/linux/mm.h | 30 ------------------------------ mm/gup.c | 6 ++---- mm/memremap.c | 38 -------------------------------------- mm/swap.c | 2 -- 4 files changed, 2 insertions(+), 74 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc5565..c63dfc804f1e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1082,30 +1082,6 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); * back into memory. */ -#if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) -DECLARE_STATIC_KEY_FALSE(devmap_managed_key); - -bool __put_devmap_managed_page_refs(struct page *page, int refs); -static inline bool put_devmap_managed_page_refs(struct page *page, int refs) -{ - if (!static_branch_unlikely(&devmap_managed_key)) - return false; - if (!is_zone_device_page(page)) - return false; - return __put_devmap_managed_page_refs(page, refs); -} -#else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ -static inline bool put_devmap_managed_page_refs(struct page *page, int refs) -{ - return false; -} -#endif /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ - -static inline bool put_devmap_managed_page(struct page *page) -{ - return put_devmap_managed_page_refs(page, 1); -} - /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ ((unsigned int) folio_ref_count(folio) + 127u <= 127u) @@ -1202,12 +1178,6 @@ static inline void put_page(struct page *page) { struct folio *folio = page_folio(page); - /* - * For some devmap managed pages we need to catch refcount transition - * from 2 to 1: - */ - if (put_devmap_managed_page(&folio->page)) - return; folio_put(folio); } diff --git a/mm/gup.c b/mm/gup.c index ce00a4c40da8..e49b1f46faa5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -87,8 +87,7 @@ static inline struct folio *try_get_folio(struct page *page, int refs) * belongs to this folio. */ if (unlikely(page_folio(page) != folio)) { - if (!put_devmap_managed_page_refs(&folio->page, refs)) - folio_put_refs(folio, refs); + folio_put_refs(folio, refs); goto retry; } @@ -184,8 +183,7 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) refs *= GUP_PIN_COUNTING_BIAS; } - if (!put_devmap_managed_page_refs(&folio->page, refs)) - folio_put_refs(folio, refs); + folio_put_refs(folio, refs); } /** diff --git a/mm/memremap.c b/mm/memremap.c index 368ff41c560b..53fe30bb79bb 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -94,19 +94,6 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn) return false; } -static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id) -{ - const struct range *range = &pgmap->ranges[range_id]; - - return (range->start + range_len(range)) >> PAGE_SHIFT; -} - -static unsigned long pfn_len(struct dev_pagemap *pgmap, unsigned long range_id) -{ - return (pfn_end(pgmap, range_id) - - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift; -} - static void pageunmap_range(struct dev_pagemap *pgmap, int range_id) { struct range *range = &pgmap->ranges[range_id]; @@ -138,10 +125,6 @@ void memunmap_pages(struct dev_pagemap *pgmap) int i; percpu_ref_kill(&pgmap->ref); - if (pgmap->type != MEMORY_DEVICE_PRIVATE && - pgmap->type != MEMORY_DEVICE_COHERENT) - for (i = 0; i < pgmap->nr_range; i++) - percpu_ref_put_many(&pgmap->ref, pfn_len(pgmap, i)); wait_for_completion(&pgmap->done); @@ -267,9 +250,6 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], PHYS_PFN(range->start), PHYS_PFN(range_len(range)), pgmap); - if (pgmap->type != MEMORY_DEVICE_PRIVATE && - pgmap->type != MEMORY_DEVICE_COHERENT) - percpu_ref_get_many(&pgmap->ref, pfn_len(pgmap, range_id)); return 0; err_add_memory: @@ -584,21 +564,3 @@ void pgmap_release_folios(struct folio *folio, int nr_folios) for (iter = folio, i = 0; i < nr_folios; iter = folio_next(folio), i++) folio_put(iter); } - -#ifdef CONFIG_FS_DAX -bool __put_devmap_managed_page_refs(struct page *page, int refs) -{ - if (page->pgmap->type != MEMORY_DEVICE_FS_DAX) - return false; - - /* - * fsdax page refcounts are 1-based, rather than 0-based: if - * refcount is 1, then the page is free and the refcount is - * stable because nobody holds a reference on the page. - */ - if (page_ref_sub_return(page, refs) == 1) - wake_up_var(page); - return true; -} -EXPORT_SYMBOL(__put_devmap_managed_page_refs); -#endif /* CONFIG_FS_DAX */ diff --git a/mm/swap.c b/mm/swap.c index 955930f41d20..0742b84fbf17 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -1003,8 +1003,6 @@ void release_pages(struct page **pages, int nr) unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - if (put_devmap_managed_page(&folio->page)) - continue; if (folio_put_testzero(folio)) free_zone_device_page(&folio->page); continue; From patchwork Fri Oct 14 23:59:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13007539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71187C4332F for ; Fri, 14 Oct 2022 23:59:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B07A6B0093; Fri, 14 Oct 2022 19:59:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 060A36B0095; Fri, 14 Oct 2022 19:59:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E432A8E0002; Fri, 14 Oct 2022 19:59:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D24276B0093 for ; Fri, 14 Oct 2022 19:59:26 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A7283160ADD for ; Fri, 14 Oct 2022 23:59:26 +0000 (UTC) X-FDA: 80021224332.12.2C41EC6 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf12.hostedemail.com (Postfix) with ESMTP id EAE9840029 for ; Fri, 14 Oct 2022 23:59:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665791966; x=1697327966; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L0V7SrzCA0aOpbgV+IVU+xT4l6Tx4MRHhVonv4kD3hk=; b=S5dDr4EsUhh33MHJJTDQ9rSgTQeIcNzvcTSq2ate64boIVl/Od+1lZGS JbVh+FamB0O5eixjFQ6tYaHyQtWUmF79ykk0XMO4IUUpUOL39FoauCAXc Piqk8aTOcQaGsfZ7TJHtmjilYR7fl56JCbJpBDpX3yFM+XsvEv0NxyY/Q pajkJEJtrl4Jn2dnR5RhKEhywx6V4x/I3J1JNUybDXl1iRcvPok5qFZ/s i5CaVxA0nqbe2t7nRrHF/B2Cz27rYDrgThxk8UY7ehoKFcgE3d5AfjvVc Os+HkxruwCbqQVQCX4RnGG/pUpu9xPZdHrXzEphOGj6/rwWqJiLSVQxTK A==; X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="285224946" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="285224946" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10500"; a="605541417" X-IronPort-AV: E=Sophos;i="5.95,185,1661842800"; d="scan'208";a="605541417" Received: from uyoon-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.90.112]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 16:59:23 -0700 Subject: [PATCH v3 25/25] mm/gup: Drop DAX pgmap accounting From: Dan Williams To: linux-mm@kvack.org Cc: Matthew Wilcox , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Jason Gunthorpe , david@fromorbit.com, nvdimm@lists.linux.dev, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org Date: Fri, 14 Oct 2022 16:59:23 -0700 Message-ID: <166579196364.2236710.8984717005481314942.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> References: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=S5dDr4Es; spf=pass (imf12.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665791966; a=rsa-sha256; cv=none; b=5lIysudwMDioJaGzAvzrWoviAaxBBk1G81UvOZJMkNsRchuvP2tL/a8Hz0PXhvjsdRUuSM W0dNK/MwEV6HxzExMtYw9SnN3/3ZHoConaGnJd7MA+ebl1Uqylm0zfP3dslnJ5FeFWKOzw aJ+P899G5vjZ4HfXUyg8YVWLBcknRFw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665791966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BV7PtXShglSg5rViPlw97kxujMZXGGIyPF+6vb9xkOE=; b=42w2fpAy5flcGiz0ylMuyLj3BepOoTRac6V2pZMsInAQ+meMkH18/qQ1qbk6jTHVN1MoKK GeEfZko4zUR8K6ddRgviO0FrgXL4QlXV60xMZ5YPObsfvQjUdPTKwTdbQATKFFxKNbl3XT cHUmo2A2/EcZ/8NCWYtUhE/cQsxPMAA= X-Rspamd-Server: rspam05 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=S5dDr4Es; spf=pass (imf12.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: ahcebtgd66kryiof66mifsqepwqag7hu X-Rspamd-Queue-Id: EAE9840029 X-HE-Tag: 1665791965-61928 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that pgmap accounting is handled at pgmap_request_folios() time, it can be dropped from gup time. A hurdle still remains that filesystem-DAX huge pages are not compound pages which still requires infrastructure like __gup_device_huge_p{m,u}d() to stick around. Additionally, ZONE_DEVICE pages with this change are still not suitable to be returned from vm_normal_page(), so this cleanup is limited to deleting pgmap reference manipulation. This is an incremental step on the path to removing pte_devmap() altogether. Note that follow_pmd_devmap() can be deleted entirely since a few additions of pmd_devmap() allows the transparent huge page path to be reused. Cc: Matthew Wilcox Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Reported-by: Jason Gunthorpe Signed-off-by: Dan Williams --- include/linux/huge_mm.h | 12 +------ mm/gup.c | 83 +++++++++++------------------------------------ mm/huge_memory.c | 48 +-------------------------- 3 files changed, 22 insertions(+), 121 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index aab708996fb0..5d861905df46 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -266,10 +266,8 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) return folio_order(folio) >= HPAGE_PMD_ORDER; } -struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, int flags, struct dev_pagemap **pgmap); struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags, struct dev_pagemap **pgmap); + pud_t *pud, int flags); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); @@ -428,14 +426,8 @@ static inline void mm_put_huge_zero_page(struct mm_struct *mm) return; } -static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap) -{ - return NULL; -} - static inline struct page *follow_devmap_pud(struct vm_area_struct *vma, - unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap) + unsigned long addr, pud_t *pud, int flags) { return NULL; } diff --git a/mm/gup.c b/mm/gup.c index e49b1f46faa5..dc5284df3f6c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -25,7 +25,6 @@ #include "internal.h" struct follow_page_context { - struct dev_pagemap *pgmap; unsigned int page_mask; }; @@ -522,8 +521,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, } static struct page *follow_page_pte(struct vm_area_struct *vma, - unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + unsigned long address, pmd_t *pmd, unsigned int flags) { struct mm_struct *mm = vma->vm_mm; struct page *page; @@ -574,17 +572,13 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, goto out; } - if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { + if (!page && pte_devmap(pte)) { /* - * Only return device mapping pages in the FOLL_GET or FOLL_PIN - * case since they are only valid while holding the pgmap - * reference. + * ZONE_DEVICE pages are not yet treated as vm_normal_page() + * instances, with respect to mapcount and compound-page + * metadata */ - *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) - page = pte_page(pte); - else - goto no_page; + page = pte_page(pte); } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ @@ -702,15 +696,8 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); goto retry; } - if (pmd_devmap(pmdval)) { - ptl = pmd_lock(mm, pmd); - page = follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap); - spin_unlock(ptl); - if (page) - return page; - } - if (likely(!pmd_trans_huge(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + if (likely(!(pmd_trans_huge(pmdval) || pmd_devmap(pmdval)))) + return follow_page_pte(vma, address, pmd, flags); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) return no_page_table(vma, flags); @@ -728,9 +715,9 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, pmd_migration_entry_wait(mm, pmd); goto retry_locked; } - if (unlikely(!pmd_trans_huge(*pmd))) { + if (unlikely(!(pmd_trans_huge(*pmd) || pmd_devmap(pmdval)))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags); } if (flags & FOLL_SPLIT_PMD) { int ret; @@ -748,7 +735,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } return ret ? ERR_PTR(ret) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags); } page = follow_trans_huge_pmd(vma, address, pmd, flags); spin_unlock(ptl); @@ -785,7 +772,7 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, } if (pud_devmap(*pud)) { ptl = pud_lock(mm, pud); - page = follow_devmap_pud(vma, address, pud, flags, &ctx->pgmap); + page = follow_devmap_pud(vma, address, pud, flags); spin_unlock(ptl); if (page) return page; @@ -832,9 +819,6 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, * * @flags can have FOLL_ flags set, defined in * - * When getting pages from ZONE_DEVICE memory, the @ctx->pgmap caches - * the device's dev_pagemap metadata to avoid repeating expensive lookups. - * * When getting an anonymous page and the caller has to trigger unsharing * of a shared anonymous page first, -EMLINK is returned. The caller should * trigger a fault with FAULT_FLAG_UNSHARE set. Note that unsharing is only @@ -889,7 +873,7 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags) { - struct follow_page_context ctx = { NULL }; + struct follow_page_context ctx = { 0 }; struct page *page; if (vma_is_secretmem(vma)) @@ -899,8 +883,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, return NULL; page = follow_page_mask(vma, address, foll_flags, &ctx); - if (ctx.pgmap) - put_dev_pagemap(ctx.pgmap); return page; } @@ -1149,7 +1131,7 @@ static long __get_user_pages(struct mm_struct *mm, { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; - struct follow_page_context ctx = { NULL }; + struct follow_page_context ctx = { 0 }; if (!nr_pages) return 0; @@ -1264,8 +1246,6 @@ static long __get_user_pages(struct mm_struct *mm, nr_pages -= page_increm; } while (nr_pages); out: - if (ctx.pgmap) - put_dev_pagemap(ctx.pgmap); return i ? i : ret; } @@ -2408,9 +2388,8 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { - struct dev_pagemap *pgmap = NULL; - int nr_start = *nr, ret = 0; pte_t *ptep, *ptem; + int ret = 0; ptem = ptep = pte_offset_map(&pmd, addr); do { @@ -2427,12 +2406,6 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, if (pte_devmap(pte)) { if (unlikely(flags & FOLL_LONGTERM)) goto pte_unmap; - - pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); - if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, flags, pages); - goto pte_unmap; - } } else if (pte_special(pte)) goto pte_unmap; @@ -2480,8 +2453,6 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, ret = 1; pte_unmap: - if (pgmap) - put_dev_pagemap(pgmap); pte_unmap(ptem); return ret; } @@ -2509,28 +2480,17 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { - int nr_start = *nr; - struct dev_pagemap *pgmap = NULL; - do { struct page *page = pfn_to_page(pfn); - pgmap = get_dev_pagemap(pfn, pgmap); - if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, flags, pages); - break; - } SetPageReferenced(page); pages[*nr] = page; - if (unlikely(!try_grab_page(page, flags))) { - undo_dev_pagemap(nr, nr_start, flags, pages); + if (unlikely(!try_grab_page(page, flags))) break; - } (*nr)++; pfn++; } while (addr += PAGE_SIZE, addr != end); - put_dev_pagemap(pgmap); return addr == end; } @@ -2539,16 +2499,14 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, struct page **pages, int *nr) { unsigned long fault_pfn; - int nr_start = *nr; fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; - if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - undo_dev_pagemap(nr, nr_start, flags, pages); + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) return 0; - } + return 1; } @@ -2557,16 +2515,13 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, struct page **pages, int *nr) { unsigned long fault_pfn; - int nr_start = *nr; fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; - if (unlikely(pud_val(orig) != pud_val(*pudp))) { - undo_dev_pagemap(nr, nr_start, flags, pages); + if (unlikely(pud_val(orig) != pud_val(*pudp))) return 0; - } return 1; } #else diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cc4a5f4791e..065c0dc03491 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1029,49 +1029,6 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, update_mmu_cache_pmd(vma, addr, pmd); } -struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, int flags, struct dev_pagemap **pgmap) -{ - unsigned long pfn = pmd_pfn(*pmd); - struct mm_struct *mm = vma->vm_mm; - struct page *page; - - assert_spin_locked(pmd_lockptr(mm, pmd)); - - /* FOLL_GET and FOLL_PIN are mutually exclusive. */ - if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == - (FOLL_PIN | FOLL_GET))) - return NULL; - - if (flags & FOLL_WRITE && !pmd_write(*pmd)) - return NULL; - - if (pmd_present(*pmd) && pmd_devmap(*pmd)) - /* pass */; - else - return NULL; - - if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd, flags & FOLL_WRITE); - - /* - * device mapped pages can only be returned if the - * caller will manage the page reference count. - */ - if (!(flags & (FOLL_GET | FOLL_PIN))) - return ERR_PTR(-EEXIST); - - pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT; - *pgmap = get_dev_pagemap(pfn, *pgmap); - if (!*pgmap) - return ERR_PTR(-EFAULT); - page = pfn_to_page(pfn); - if (!try_grab_page(page, flags)) - page = ERR_PTR(-ENOMEM); - - return page; -} - int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) @@ -1188,7 +1145,7 @@ static void touch_pud(struct vm_area_struct *vma, unsigned long addr, } struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags, struct dev_pagemap **pgmap) + pud_t *pud, int flags) { unsigned long pfn = pud_pfn(*pud); struct mm_struct *mm = vma->vm_mm; @@ -1222,9 +1179,6 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, return ERR_PTR(-EEXIST); pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; - *pgmap = get_dev_pagemap(pfn, *pgmap); - if (!*pgmap) - return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); if (!try_grab_page(page, flags)) page = ERR_PTR(-ENOMEM);