From patchwork Sun Sep 4 02:16:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12965094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D3DCECAAD5 for ; Sun, 4 Sep 2022 02:16:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C06FF80170; Sat, 3 Sep 2022 22:16:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB5828015A; Sat, 3 Sep 2022 22:16:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA66380170; Sat, 3 Sep 2022 22:16:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9B1CD8015A for ; Sat, 3 Sep 2022 22:16:14 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6829C12048F for ; Sun, 4 Sep 2022 02:16:14 +0000 (UTC) X-FDA: 79872788268.08.DFD7193 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf25.hostedemail.com (Postfix) with ESMTP id B0815A006A for ; Sun, 4 Sep 2022 02:16:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662257773; x=1693793773; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XECMJ3rDr2QuKr70k+HRef4UN7pYK04Pd7WfZFPApLY=; b=gt5vui+HKKa+NLnCjBReVUKc8c1b0ZCiGemHBqN10v35doffckrR6D/t JmXnNmtLPa/DGw7OwSBkXCmWAvSRLPQp7Dw3sYdLaGfEI8itqTsMwHKl9 T3L+uyjJytaW5nfyxg2JF97G11co9AGqk/qomTEBRwlfCo6HhudtEW9I5 oFbAmMfsK1sepvVWALK7gaIv2RCHjTFUtajbaMwTXpU4YxPwi9AtUFVwz OchAIQaaT/wzlhfyryieC+8+/ZE4sewKVUE/gaRF5nGdkRJ0iiATflPr8 8j8D0y30boPbuK6OxOkuUWkW3XF9VaFX/bb2mn9gyqL5EUvRUGE62our/ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10459"; a="357917710" X-IronPort-AV: E=Sophos;i="5.93,288,1654585200"; d="scan'208";a="357917710" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2022 19:16:12 -0700 X-IronPort-AV: E=Sophos;i="5.93,288,1654585200"; d="scan'208";a="646515349" Received: from pg4-mobl3.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.212.132.198]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2022 19:16:12 -0700 Subject: [PATCH 02/13] fsdax: Use page_maybe_dma_pinned() for DAX vs DMA collisions From: Dan Williams To: akpm@linux-foundation.org Cc: Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , Jason Gunthorpe , Matthew Wilcox , linux-mm@kvack.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org Date: Sat, 03 Sep 2022 19:16:12 -0700 Message-ID: <166225777193.2351842.16365701080007152185.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <166225775968.2351842.11156458342486082012.stgit@dwillia2-xfh.jf.intel.com> References: <166225775968.2351842.11156458342486082012.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gt5vui+H; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662257774; a=rsa-sha256; cv=none; b=N/KBa9oVo2mO5CO6ZNYgl3BUENL9i7yLhge7NFh1bQAne9dHUDcJ6xhBSLd382RQECjQcO KjCmAndWCm0jn9Go+tccqHc6TxBCYBcBaBsXK1G1ipT2jeDMXF7mddIfb/PtsmC26HBGim tNMMSvz1Wzy0kCMA9ClSbEWvD5toAsY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662257774; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fOFbKmJVavJclhJ9AQw9D3Ma3DEFJSx6W2KGZ0TzlVs=; b=tInbtaxj0Wsui4YW6/HZZJpbD6AYMfQUB/Vx2YbVvqA7zLn34Hv/TNOERSDKX198gv1cyI G0MIEZEt9Tv/08mTKfFPR5a+ziXUXjZwT6VafugG5hxaO3dzpHQVk5CHQI3E0MS5EQSDKQ vu2ESgt5yY7pRHploHrD5bbKgyG8VRk= X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=gt5vui+H; spf=pass (imf25.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B0815A006A X-Stat-Signature: fph8gwwoec6qzysfdhr6iuyt78tq17xd X-HE-Tag: 1662257773-293530 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The pin_user_pages() + page_maybe_dma_pinned() infrastructure is a framework for tackling the kernel's struggles with gup+DMA. DAX presents a unique flavor of the gup+DMA problem since pinned pages are identical to physical filesystem blocks. Unlike the page-cache case, a mapping of a file can not be truncated while DMA is in-flight because the DMA must complete before the filesystem block is reclaimed. DAX has a homegrown solution to this problem based on watching the page->_refcount go idle. Beyond being awkward to catch that idle transition in put_page(), it is overkill when only the page_maybe_dma_pinned() transition needs to be captured. Move the wakeup of filesystem-DAX truncate paths ({ext4,xfs,fuse_dax}_break_layouts()) to unpin_user_pages() with a new wakeup_fsdax_pin_waiters() helper, and use !page_maybe_dma_pinned() as the wake condition. Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Christoph Hellwig Cc: John Hubbard Reported-by: Jason Gunthorpe Reported-by: Matthew Wilcox Signed-off-by: Dan Williams --- fs/dax.c | 4 ++-- fs/ext4/inode.c | 7 +++---- fs/fuse/dax.c | 6 +++--- fs/xfs/xfs_file.c | 6 +++--- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ mm/gup.c | 6 ++++-- 6 files changed, 43 insertions(+), 14 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 0f22f7b46de0..aceb587bc27e 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -395,7 +395,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(trunc && page_ref_count(page) > 1); + WARN_ON_ONCE(trunc && page_maybe_dma_pinned(page)); if (dax_mapping_is_cow(page->mapping)) { /* keep the CoW flag if this page is still shared */ if (page->index-- > 0) @@ -414,7 +414,7 @@ static struct page *dax_pinned_page(void *entry) for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - if (page_ref_count(page) > 1) + if (page_maybe_dma_pinned(page)) return page; } return NULL; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bf49bf506965..5e68e64f155a 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3961,10 +3961,9 @@ int ext4_break_layouts(struct inode *inode) if (!page) return 0; - error = ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, - TASK_INTERRUPTIBLE, 0, 0, - ext4_wait_dax_page(inode)); + error = ___wait_var_event(page, !page_maybe_dma_pinned(page), + TASK_INTERRUPTIBLE, 0, 0, + ext4_wait_dax_page(inode)); } while (error == 0); return error; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index e0b846f16bc5..6419ca420c42 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -676,9 +676,9 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, return 0; *retry = true; - return ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, fuse_wait_dax_page(inode)); + return ___wait_var_event(page, !page_maybe_dma_pinned(page), + TASK_INTERRUPTIBLE, 0, 0, + fuse_wait_dax_page(inode)); } /* dmap_end == 0 leads to unmapping of whole file */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 954bb6e83796..dbffb9481b71 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -827,9 +827,9 @@ xfs_break_dax_layouts( return 0; *retry = true; - return ___wait_var_event(&page->_refcount, - atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, xfs_wait_dax_page(inode)); + return ___wait_var_event(page, !page_maybe_dma_pinned(page), + TASK_INTERRUPTIBLE, 0, 0, + xfs_wait_dax_page(inode)); } int diff --git a/include/linux/mm.h b/include/linux/mm.h index 3bedc449c14d..557d5447ebec 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1517,6 +1517,34 @@ static inline bool page_maybe_dma_pinned(struct page *page) return folio_maybe_dma_pinned(page_folio(page)); } +#if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) +/* + * Unlike typical file backed pages that support truncating a page from + * a file while it is under active DMA, DAX pages need to hold off + * truncate operations until transient page pins are released. + * + * The filesystem (via dax_layout_pinned_page()) takes steps to make + * sure that any observation of the !page_maybe_dma_pinned() state is + * stable until the truncation completes. + */ +static inline void wakeup_fsdax_pin_waiters(struct folio *folio) +{ + struct page *page = &folio->page; + + if (!folio_is_zone_device(folio)) + return; + if (page->pgmap->type != MEMORY_DEVICE_FS_DAX) + return; + if (folio_maybe_dma_pinned(folio)) + return; + wake_up_var(page); +} +#else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ +static inline void wakeup_fsdax_pin_waiters(struct folio *folio) +{ +} +#endif /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ + /* * This should most likely only be called during fork() to see whether we * should break the cow immediately for an anon page on the src mm. diff --git a/mm/gup.c b/mm/gup.c index 732825157430..499c46296fda 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -177,8 +177,10 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) refs *= GUP_PIN_COUNTING_BIAS; } - if (!put_devmap_managed_page_refs(&folio->page, refs)) - folio_put_refs(folio, refs); + folio_put_refs(folio, refs); + + if (flags & FOLL_PIN) + wakeup_fsdax_pin_waiters(folio); } /**