From patchwork Sat Sep 18 01:30:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12503381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 263F9C433EF for ; Sat, 18 Sep 2021 01:30:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0FE5B6112E for ; Sat, 18 Sep 2021 01:30:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235505AbhIRBcO (ORCPT ); Fri, 17 Sep 2021 21:32:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:37478 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235000AbhIRBcN (ORCPT ); Fri, 17 Sep 2021 21:32:13 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9004760FBF; Sat, 18 Sep 2021 01:30:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631928650; bh=PIAaHQqVTXKfleLwHIOPHezwgphMUdmudkO/sfnyHUE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=oBnE4ki4GIlbKhlyyuKa9+z0Jcj9Gs4O8Y+zlT6jC4PmaalNrVFUlONNNpMTLExa4 wpW0SdRhe/uCgKdoLN7lIla+hM5UyZ+3WyxWXYzmE7/nfNMtsbEli8RR8Ofb6dGy7l /dlJnzL7ZGiGeyF7JOwBItzVNbMgfPOpVkH4KrfSbfs8VT0FEkzsOPrAp3mvJKzkCj uBAn5ED554rxqpJzp5n9S/ksOl0aXwm8YcqOqDXo5RdKXfkgP73RgVrOOCtu0MKm0y t/52gMffBbFR3J7XpV6EeYUI1IdtQcz8EfxsQULyOk6t6a9tQOac9p3Swjt9FTvHmi +MIG+n0L4p23g== Subject: [PATCH 1/5] dax: prepare pmem for use by zero-initializing contents and clearing poisons From: "Darrick J. Wong" To: djwong@kernel.org, jane.chu@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org Date: Fri, 17 Sep 2021 18:30:50 -0700 Message-ID: <163192865031.417973.8372869475521627214.stgit@magnolia> In-Reply-To: <163192864476.417973.143014658064006895.stgit@magnolia> References: <163192864476.417973.143014658064006895.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Our current "advice" to people using persistent memory and FSDAX who wish to recover upon receipt of a media error (aka 'hwpoison') event from ACPI is to punch-hole that part of the file and then pwrite it, which will magically cause the pmem to be reinitialized and the poison to be cleared. Punching doesn't make any sense at all -- the (re)allocation on pwrite does not permit the caller to specify where to find blocks, which means that we might not get the same pmem back. This pushes the user farther away from the goal of reinitializing poisoned memory and leads to complaints about unnecessary file fragmentation. AFAICT, the only reason why the "punch and write" dance works at all is that the XFS and ext4 currently call blkdev_issue_zeroout when allocating pmem ahead of a write call. Even a regular overwrite won't clear the poison, because dax_direct_access is smart enough to bail out on poisoned pmem, but not smart enough to clear it. To be fair, that function maps pages and has no idea what kinds of reads and writes the caller might want to perform. Therefore, create a dax_zeroinit_range function that filesystems can to reset the pmem contents to zero and clear hardware media error flags. This uses the dax page zeroing helper function, which should ensure that subsequent accesses will not trip over any pre-existing media errors. Signed-off-by: Darrick J. Wong --- fs/dax.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dax.h | 7 ++++ 2 files changed, 100 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index 4e3e5a283a91..765b80d08605 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1714,3 +1714,96 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, return dax_insert_pfn_mkwrite(vmf, pfn, order); } EXPORT_SYMBOL_GPL(dax_finish_sync_fault); + +static loff_t +dax_zeroinit_iter(struct iomap_iter *iter) +{ + struct iomap *iomap = &iter->iomap; + const struct iomap *srcmap = iomap_iter_srcmap(iter); + const u64 start = iomap->addr + iter->pos - iomap->offset; + const u64 nr_bytes = iomap_length(iter); + u64 start_page = start >> PAGE_SHIFT; + u64 nr_pages = nr_bytes >> PAGE_SHIFT; + int ret; + + if (!iomap->dax_dev) + return -ECANCELED; + + /* + * The physical extent must be page aligned because that's what the dax + * function requires. + */ + if (!PAGE_ALIGNED(start | nr_bytes)) + return -ECANCELED; + + /* + * The dax function, by using pgoff_t, is stuck with unsigned long, so + * we must check for overflows. + */ + if (start_page >= ULONG_MAX || start_page + nr_pages > ULONG_MAX) + return -ECANCELED; + + /* Must be able to zero storage directly without fs intervention. */ + if (iomap->flags & IOMAP_F_SHARED) + return -ECANCELED; + if (srcmap != iomap) + return -ECANCELED; + + switch (iomap->type) { + case IOMAP_MAPPED: + while (nr_pages > 0) { + /* XXX function only supports one page at a time?! */ + ret = dax_zero_page_range(iomap->dax_dev, start_page, + 1); + if (ret) + return ret; + start_page++; + nr_pages--; + } + + fallthrough; + case IOMAP_UNWRITTEN: + return nr_bytes; + } + + /* Reject holes, inline data, or delalloc extents. */ + return -ECANCELED; +} + +/* + * Initialize storage mapped to a DAX-mode file to a known value and ensure the + * media are ready to accept read and write commands. This requires the use of + * the dax layer's zero page range function to write zeroes to a pmem region + * and to reset any hardware media error state. + * + * The physical extents must be aligned to page size. The file must be backed + * by a pmem device. The extents returned must not require copy on write (or + * any other mapping interventions from the filesystem) and must be contiguous. + * @done will be set to true if the reset succeeded. + * + * Returns 0 if the zero initialization succeeded, -ECANCELED if the storage + * mappings do not support zero initialization, -EOPNOTSUPP if the device does + * not support it, or the usual negative errno. + */ +int +dax_zeroinit_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops) +{ + struct iomap_iter iter = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_REPORT, + }; + int ret; + + if (!IS_DAX(inode)) + return -EINVAL; + if (pos + len > i_size_read(inode)) + return -EINVAL; + + while ((ret = iomap_iter(&iter, ops)) > 0) + iter.processed = dax_zeroinit_iter(&iter); + return ret; +} +EXPORT_SYMBOL_GPL(dax_zeroinit_range); diff --git a/include/linux/dax.h b/include/linux/dax.h index 2619d94c308d..3c873f7c35ba 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -129,6 +129,8 @@ struct page *dax_layout_busy_page(struct address_space *mapping); struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); +int dax_zeroinit_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops); #else #define generic_fsdax_supported NULL @@ -174,6 +176,11 @@ static inline dax_entry_t dax_lock_page(struct page *page) static inline void dax_unlock_page(struct page *page, dax_entry_t cookie) { } +static inline int dax_zeroinit_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops) +{ + return -EOPNOTSUPP; +} #endif #if IS_ENABLED(CONFIG_DAX) From patchwork Sat Sep 18 01:30:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12503383 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B023C433FE for ; Sat, 18 Sep 2021 01:30:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 747506113A for ; Sat, 18 Sep 2021 01:30:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235555AbhIRBcT (ORCPT ); Fri, 17 Sep 2021 21:32:19 -0400 Received: from mail.kernel.org ([198.145.29.99]:37530 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235000AbhIRBcS (ORCPT ); Fri, 17 Sep 2021 21:32:18 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0F3966112E; Sat, 18 Sep 2021 01:30:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631928656; bh=nEus6JVO0rVrNO4U1GpVQsTU6C8NnLDQuj58n91u7wU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=A7yBcdRNgMFJa3es62kKFc8ydkygnm9GiEU3ifw7DNZ9RHQUZQgD2QsBZZtYVkVW4 kXjrbGGNZiwOh3jDPcZmXJ1HjX0zLGPEaLKtLXq+oepq91u1dO5+umcTZsJg13sZ0j SOHGIKQTkTvrw1EAu+2qVHMAMQ0GOoFOabP+4m5TP3KlYnAEGRxiBvGBtznZsGdGrG y8vi2kOS+eQArmE1KBcxEPecXCL1r0ZCg4ksghCwof51hOY9f6VJ0su6dd/optJHCN yp5CpdkdEt7HNFu4PfhsmHLXGkp72cDXQZ9hjAn9ajqT+3O0oTpl/EtK++ty3OqWsf Kh1djwgmXTQwg== Subject: [PATCH 2/5] iomap: use accelerated zeroing on a block device to zero a file range From: "Darrick J. Wong" To: djwong@kernel.org, jane.chu@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org Date: Fri, 17 Sep 2021 18:30:55 -0700 Message-ID: <163192865577.417973.11122330974455662098.stgit@magnolia> In-Reply-To: <163192864476.417973.143014658064006895.stgit@magnolia> References: <163192864476.417973.143014658064006895.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Create a function that ensures that the storage backing part of a file contains zeroes and will not trip over old media errors if the contents are re-read. Signed-off-by: Darrick J. Wong --- fs/iomap/direct-io.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/iomap.h | 3 ++ 2 files changed, 78 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 4ecd255e0511..48826a49f976 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -652,3 +652,78 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + +static loff_t +iomap_zeroinit_iter(struct iomap_iter *iter) +{ + struct iomap *iomap = &iter->iomap; + const struct iomap *srcmap = iomap_iter_srcmap(iter); + const u64 start = iomap->addr + iter->pos - iomap->offset; + const u64 nr_bytes = iomap_length(iter); + sector_t sector = start >> SECTOR_SHIFT; + sector_t nr_sectors = nr_bytes >> SECTOR_SHIFT; + int ret; + + if (!iomap->bdev) + return -ECANCELED; + + /* The physical extent must be sector-aligned for block layer APIs. */ + if ((start | nr_bytes) & (SECTOR_SIZE - 1)) + return -EINVAL; + + /* Must be able to zero storage directly without fs intervention. */ + if (iomap->flags & IOMAP_F_SHARED) + return -ECANCELED; + if (srcmap != iomap) + return -ECANCELED; + + switch (iomap->type) { + case IOMAP_MAPPED: + ret = blkdev_issue_zeroout(iomap->bdev, sector, nr_sectors, + GFP_KERNEL, 0); + if (ret) + return ret; + fallthrough; + case IOMAP_UNWRITTEN: + return nr_bytes; + } + + /* Reject holes, inline data, or delalloc extents. */ + return -ECANCELED; +} + +/* + * Use a storage device's accelerated zero-writing command to ensure the media + * are ready to accept read and write commands. FSDAX is not supported. + * + * The range arguments must be aligned to sector size. The file must be backed + * by a block device. The extents returned must not require copy on write (or + * any other mapping interventions from the filesystem) and must be contiguous. + * @done will be set to true if the reset succeeded. + * + * Returns 0 if the zero initialization succeeded, -ECANCELED if the storage + * mappings do not support zero initialization, -EOPNOTSUPP if the device does + * not support it, or the usual negative errno. + */ +int +iomap_zeroout_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops) +{ + struct iomap_iter iter = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_REPORT, + }; + int ret; + + if (IS_DAX(inode)) + return -EINVAL; + if (pos + len > i_size_read(inode)) + return -EINVAL; + + while ((ret = iomap_iter(&iter, ops)) > 0) + iter.processed = iomap_zeroinit_iter(&iter); + return ret; +} +EXPORT_SYMBOL_GPL(iomap_zeroout_range); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 24f8489583ca..f4b9c6698388 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -339,6 +339,9 @@ struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); +int iomap_zeroout_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops); + #ifdef CONFIG_SWAP struct file; struct swap_info_struct; From patchwork Sat Sep 18 01:31:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12503385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240A1C433EF for ; Sat, 18 Sep 2021 01:31:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F71A61152 for ; Sat, 18 Sep 2021 01:31:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235594AbhIRBcZ (ORCPT ); Fri, 17 Sep 2021 21:32:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:37580 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235000AbhIRBcY (ORCPT ); Fri, 17 Sep 2021 21:32:24 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 819776112E; Sat, 18 Sep 2021 01:31:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631928661; bh=hQ4c0bPf57fHRzGGgP77lvdFuE58UvQxRXy7RrfWhWs=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=MZsnr/s72Hhb9qBgsrjbJK8aPteYizYSeSWskfFH2V/BJjx0xvbrwuL4D2mE0Gv9g 0WE/A+igxDGxOqHChZ5LLvf1BKEFSJzRkharPAc9x4PiatyZXEpRiMSgipfuT2VEFw O8vIpPd5gn7/HymjoVrGhQueEXea9Q/HfsjoXV7I47n/i7cNFaHebQMQtocRONSaXj h4cMdi6cidnz/Wy7ytADqyOFYWTSFzqDcNG4c3kb0XWN6MT7Rn9R8LRarzOYJIt9RR 9GMECYHZVQs0Aag/BfvDurgw49lBgnCiRpxCjc6/RqhxttUCqFmnI+U2pLaUI+kO0e gbLK/F1+a4/yA== Subject: [PATCH 3/5] vfs: add a zero-initialization mode to fallocate From: "Darrick J. Wong" To: djwong@kernel.org, jane.chu@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org Date: Fri, 17 Sep 2021 18:31:01 -0700 Message-ID: <163192866125.417973.7293598039998376121.stgit@magnolia> In-Reply-To: <163192864476.417973.143014658064006895.stgit@magnolia> References: <163192864476.417973.143014658064006895.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add a new mode to fallocate to zero-initialize all the storage backing a file. Signed-off-by: Darrick J. Wong --- fs/open.c | 5 +++++ include/linux/falloc.h | 1 + include/uapi/linux/falloc.h | 9 +++++++++ 3 files changed, 15 insertions(+) diff --git a/fs/open.c b/fs/open.c index daa324606a41..230220b8f67a 100644 --- a/fs/open.c +++ b/fs/open.c @@ -256,6 +256,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) (mode & ~FALLOC_FL_INSERT_RANGE)) return -EINVAL; + /* Zeroinit should only be used by itself and keep size must be set. */ + if ((mode & FALLOC_FL_ZEROINIT_RANGE) && + (mode != (FALLOC_FL_ZEROINIT_RANGE | FALLOC_FL_KEEP_SIZE))) + return -EINVAL; + /* Unshare range should only be used with allocate mode. */ if ((mode & FALLOC_FL_UNSHARE_RANGE) && (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE))) diff --git a/include/linux/falloc.h b/include/linux/falloc.h index f3f0b97b1675..4597b416667b 100644 --- a/include/linux/falloc.h +++ b/include/linux/falloc.h @@ -29,6 +29,7 @@ struct space_resv { FALLOC_FL_PUNCH_HOLE | \ FALLOC_FL_COLLAPSE_RANGE | \ FALLOC_FL_ZERO_RANGE | \ + FALLOC_FL_ZEROINIT_RANGE | \ FALLOC_FL_INSERT_RANGE | \ FALLOC_FL_UNSHARE_RANGE) diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h index 51398fa57f6c..8144403b6102 100644 --- a/include/uapi/linux/falloc.h +++ b/include/uapi/linux/falloc.h @@ -77,4 +77,13 @@ */ #define FALLOC_FL_UNSHARE_RANGE 0x40 +/* + * FALLOC_FL_ZEROINIT_RANGE is used to reinitialize storage backing a file by + * writing zeros to it. Subsequent read and writes should not fail due to any + * previous media errors. Blocks must be not be shared or require copy on + * write. Holes and unwritten extents are left untouched. This mode must be + * used with FALLOC_FL_KEEP_SIZE. + */ +#define FALLOC_FL_ZEROINIT_RANGE 0x80 + #endif /* _UAPI_FALLOC_H_ */ From patchwork Sat Sep 18 01:31:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12503387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43D4DC433FE for ; Sat, 18 Sep 2021 01:31:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B44F61241 for ; Sat, 18 Sep 2021 01:31:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235807AbhIRBca (ORCPT ); Fri, 17 Sep 2021 21:32:30 -0400 Received: from mail.kernel.org ([198.145.29.99]:37642 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235000AbhIRBc3 (ORCPT ); Fri, 17 Sep 2021 21:32:29 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 02AC76112E; Sat, 18 Sep 2021 01:31:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631928667; bh=xuiaU4G94JZ0dj6J2nwdrhrGZ7owpzE72Ua/872ZRe4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=kvWknvJxz/MOdA9ATXRi0S8rMr/0nfxuhTxHByrYJrTPYBaEJjfb/MCKD9KmvuLLF jY1zpRlpKBIjf+XDrpa4sjvur63liF6nAgzpSC7VGKr9IWs8pOGcEKpiIIjnGiUllo aXIx+qbbOmCOzVXEmsrrZZVPPTzy7SJfCpMbXgxSOnFUkp5wlSMJHe1cs+2JzGQpju Vq++B8Bkue0K22dgzhJhwEahvRT53fBQBBsih9NGge4kGMjsChdC+GJkZECQxSva0h OiGrTEnj8E1NkGD4lDxjInnyDKAwb27wMkwQ4sh1mLuVwIlHJQYWS7BJFFoaZmLW0I SVjPkGCtS1g5g== Subject: [PATCH 4/5] xfs: implement FALLOC_FL_ZEROINIT_RANGE From: "Darrick J. Wong" To: djwong@kernel.org, jane.chu@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org Date: Fri, 17 Sep 2021 18:31:06 -0700 Message-ID: <163192866672.417973.1497612280233084622.stgit@magnolia> In-Reply-To: <163192864476.417973.143014658064006895.stgit@magnolia> References: <163192864476.417973.143014658064006895.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Implement this new fallocate mode so that persistent memory users can, upon receipt of a pmem poison notification, cause the pmem to be reinitialized to a known value (zero) and clear any hardware poison state that might be lurking. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 22 ++++++++++++++++++++++ fs/xfs/xfs_bmap_util.h | 2 ++ fs/xfs/xfs_file.c | 11 ++++++++--- fs/xfs/xfs_trace.h | 1 + 4 files changed, 33 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 73a36b7be3bd..319e79bb7fd8 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -956,6 +956,28 @@ xfs_flush_unmap_range( return 0; } +int +xfs_file_zeroinit_space( + struct xfs_inode *ip, + xfs_off_t offset, + xfs_off_t len) +{ + struct inode *inode = VFS_I(ip); + int error; + + trace_xfs_zeroinit_file_space(ip, offset, len); + + if (IS_DAX(inode)) + error = dax_zeroinit_range(inode, offset, len, + &xfs_read_iomap_ops); + else + error = iomap_zeroout_range(inode, offset, len, + &xfs_read_iomap_ops); + if (error == -ECANCELED) + return -EOPNOTSUPP; + return error; +} + int xfs_free_file_space( struct xfs_inode *ip, diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 9f993168b55b..7bb425d0876c 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -61,6 +61,8 @@ int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset, xfs_off_t len); int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset, xfs_off_t len); +int xfs_file_zeroinit_space(struct xfs_inode *ip, xfs_off_t offset, + xfs_off_t length); /* EOF block manipulation functions */ bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 7aa943edfc02..886e819efa3b 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -899,7 +899,8 @@ xfs_break_layouts( #define XFS_FALLOC_FL_SUPPORTED \ (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \ - FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE) + FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE | \ + FALLOC_FL_ZEROINIT_RANGE) STATIC long xfs_file_fallocate( @@ -950,13 +951,17 @@ xfs_file_fallocate( * handled at the right time by xfs_prepare_shift(). */ if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | - FALLOC_FL_COLLAPSE_RANGE)) { + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZEROINIT_RANGE)) { error = xfs_flush_unmap_range(ip, offset, len); if (error) goto out_unlock; } - if (mode & FALLOC_FL_PUNCH_HOLE) { + if (mode & FALLOC_FL_ZEROINIT_RANGE) { + error = xfs_file_zeroinit_space(ip, offset, len); + if (error) + goto out_unlock; + } else if (mode & FALLOC_FL_PUNCH_HOLE) { error = xfs_free_file_space(ip, offset, len); if (error) goto out_unlock; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 4a8076ef8cb4..eccad0a5c40f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1534,6 +1534,7 @@ DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof); DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write); DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten); DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append); +DEFINE_SIMPLE_IO_EVENT(xfs_zeroinit_file_space); DECLARE_EVENT_CLASS(xfs_itrunc_class, TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), From patchwork Sat Sep 18 01:31:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 12503389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2692C433FE for ; Sat, 18 Sep 2021 01:31:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EBAC6113A for ; Sat, 18 Sep 2021 01:31:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236103AbhIRBcg (ORCPT ); Fri, 17 Sep 2021 21:32:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:37686 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235837AbhIRBcf (ORCPT ); Fri, 17 Sep 2021 21:32:35 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7969E6112E; Sat, 18 Sep 2021 01:31:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631928672; bh=dwXjDu+phlqB4nQNRy2B7qsNjeqxx6Rsk+wlMEte3mk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=sdplwT0LTfyOJzNVQq+giKfA32wbHqeVECf7JBVHmJ+TUbocTvVPnfBRvazXnzJa1 QgW5+NcJGSnAIeQDQSpnZiUBGHxdNFOf/IMSeW6PEaal3Q7WYD+oCcLTV9q+ZtDsB9 F53dzBbGANEHddzN2xnyakFt5M0jkY4wgivcVrLmb6IBhch9DTWDU32cgN7S830I1+ qhatlsQv4XT1WMPy5GGHduYSXppHCHPSAWYeWyOfIt/T0sxs8jlTi5j13hgyfaaN1R 0+ayTBDxj38KRRt3m9pvkrJShXH32fmrSojNDjC6r/p85CFA/03H3m69bSecmog0c3 N+N7jA6fbB8Tg== Subject: [PATCH 5/5] ext4: implement FALLOC_FL_ZEROINIT_RANGE From: "Darrick J. Wong" To: djwong@kernel.org, jane.chu@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org Date: Fri, 17 Sep 2021 18:31:12 -0700 Message-ID: <163192867220.417973.4913917281472586603.stgit@magnolia> In-Reply-To: <163192864476.417973.143014658064006895.stgit@magnolia> References: <163192864476.417973.143014658064006895.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Implement this new fallocate mode so that persistent memory users can, upon receipt of a pmem poison notification, cause the pmem to be reinitialized to a known value (zero) and clear any hardware poison state that might be lurking. Signed-off-by: Darrick J. Wong --- fs/ext4/extents.c | 93 +++++++++++++++++++++++++++++++++++++++++++ include/trace/events/ext4.h | 7 +++ 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index c0de30f25185..c345002e2da6 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "ext4_jbd2.h" #include "ext4_extents.h" #include "xattr.h" @@ -4475,6 +4476,90 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len); static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len); +static long ext4_zeroinit_range(struct file *file, loff_t offset, loff_t len) +{ + struct inode *inode = file_inode(file); + struct address_space *mapping = inode->i_mapping; + handle_t *handle = NULL; + loff_t end = offset + len; + long ret; + + trace_ext4_zeroinit_range(inode, offset, len, + FALLOC_FL_ZEROINIT_RANGE | FALLOC_FL_KEEP_SIZE); + + /* We don't support data=journal mode */ + if (ext4_should_journal_data(inode)) + return -EOPNOTSUPP; + + inode_lock(inode); + + /* + * Indirect files do not support unwritten extents + */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { + ret = -EOPNOTSUPP; + goto out_mutex; + } + + /* Wait all existing dio workers, newcomers will block on i_mutex */ + inode_dio_wait(inode); + + /* + * Prevent page faults from reinstantiating pages we have released from + * page cache. + */ + filemap_invalidate_lock(mapping); + + ret = ext4_break_layouts(inode); + if (ret) + goto out_mmap; + + /* Now release the pages and zero block aligned part of pages */ + truncate_pagecache_range(inode, offset, end - 1); + inode->i_mtime = inode->i_ctime = current_time(inode); + + if (IS_DAX(inode)) + ret = dax_zeroinit_range(inode, offset, len, + &ext4_iomap_report_ops); + else + ret = iomap_zeroout_range(inode, offset, len, + &ext4_iomap_report_ops); + if (ret == -ECANCELED) + ret = -EOPNOTSUPP; + if (ret) + goto out_mmap; + + /* + * In worst case we have to writeout two nonadjacent unwritten + * blocks and update the inode + */ + handle = ext4_journal_start(inode, EXT4_HT_MISC, 1); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + ext4_std_error(inode->i_sb, ret); + goto out_mmap; + } + + inode->i_mtime = inode->i_ctime = current_time(inode); + ret = ext4_mark_inode_dirty(handle, inode); + if (unlikely(ret)) + goto out_handle; + ext4_fc_track_range(handle, inode, offset >> inode->i_sb->s_blocksize_bits, + (offset + len - 1) >> inode->i_sb->s_blocksize_bits); + ext4_update_inode_fsync_trans(handle, inode, 1); + + if (file->f_flags & O_SYNC) + ext4_handle_sync(handle); + +out_handle: + ext4_journal_stop(handle); +out_mmap: + filemap_invalidate_unlock(mapping); +out_mutex: + inode_unlock(inode); + return ret; +} + static long ext4_zero_range(struct file *file, loff_t offset, loff_t len, int mode) { @@ -4659,7 +4744,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) /* Return error if mode is not supported */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | - FALLOC_FL_INSERT_RANGE)) + FALLOC_FL_INSERT_RANGE | FALLOC_FL_ZEROINIT_RANGE)) return -EOPNOTSUPP; ext4_fc_start_update(inode); @@ -4687,6 +4772,12 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) ret = ext4_zero_range(file, offset, len, mode); goto exit; } + + if (mode & FALLOC_FL_ZEROINIT_RANGE) { + ret = ext4_zeroinit_range(file, offset, len); + goto exit; + } + trace_ext4_fallocate_enter(inode, offset, len, mode); lblk = offset >> blkbits; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 0ea36b2b0662..282f1208067f 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -1407,6 +1407,13 @@ DEFINE_EVENT(ext4__fallocate_mode, ext4_zero_range, TP_ARGS(inode, offset, len, mode) ); +DEFINE_EVENT(ext4__fallocate_mode, ext4_zeroinit_range, + + TP_PROTO(struct inode *inode, loff_t offset, loff_t len, int mode), + + TP_ARGS(inode, offset, len, mode) +); + TRACE_EVENT(ext4_fallocate_exit, TP_PROTO(struct inode *inode, loff_t offset, unsigned int max_blocks, int ret),