From patchwork Fri May 21 06:40:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4674C433ED for ; Fri, 21 May 2021 06:42:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B00F8611AD for ; Fri, 21 May 2021 06:42:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234442AbhEUGn6 (ORCPT ); Fri, 21 May 2021 02:43:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:57726 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234444AbhEUGm1 (ORCPT ); Fri, 21 May 2021 02:42:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579256; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0A2xSZRREX8FsXTdqEJSZz4vVoDheWd55fZtt3iXWjE=; b=qzlLYqVnF8L2nasJB8O2pCNLQJCzkbgxOwXRCSV71AJKx8fRAEf7l8eQXHhdWEBFWtfGgF UpcT0w5iONkJE7Oo0eITK1x02ruldFOgxW7ARdAV3hj7jjPzCfgIeOWK3FDrywwbrm8oGU 0+OHzynxsnTJhiNOG9elXbiMWFAYH+4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1AC81ABED for ; Fri, 21 May 2021 06:40:56 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 01/31] btrfs: pass bytenr directly to __process_pages_contig() Date: Fri, 21 May 2021 14:40:20 +0800 Message-Id: <20210521064050.191164-2-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As a preparation for incoming subpage support, we need bytenr passed to __process_pages_contig() directly, not the current page index. So change the parameter and all callers to pass bytenr in. With the modification, here we need to replace the old @index_ret with @processed_end for __process_pages_contig(), but this brings a small problem. Normally we follow the inclusive return value, meaning @processed_end should be the last byte we processed. If parameter @start is 0, and we failed to lock any page, then we would return @processed_end as -1, causing more problems for __unlock_for_delalloc(). So here for @processed_end, we use two different return value patterns. If we have locked any page, @processed_end will be the last byte of locked page. Or it will be @start otherwise. This change will impact lock_delalloc_pages(), so it needs to check @processed_end to only unlock the range if we have locked any. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 57 ++++++++++++++++++++++++++++---------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f538bcb129fd..c2914f851692 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1810,8 +1810,8 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, - pgoff_t start_index, pgoff_t end_index, - unsigned long page_ops, pgoff_t *index_ret); + u64 start, u64 end, unsigned long page_ops, + u64 *processed_end); static noinline void __unlock_for_delalloc(struct inode *inode, struct page *locked_page, @@ -1824,7 +1824,7 @@ static noinline void __unlock_for_delalloc(struct inode *inode, if (index == locked_page->index && end_index == index) return; - __process_pages_contig(inode->i_mapping, locked_page, index, end_index, + __process_pages_contig(inode->i_mapping, locked_page, start, end, PAGE_UNLOCK, NULL); } @@ -1834,19 +1834,19 @@ static noinline int lock_delalloc_pages(struct inode *inode, u64 delalloc_end) { unsigned long index = delalloc_start >> PAGE_SHIFT; - unsigned long index_ret = index; unsigned long end_index = delalloc_end >> PAGE_SHIFT; + u64 processed_end = delalloc_start; int ret; ASSERT(locked_page); if (index == locked_page->index && index == end_index) return 0; - ret = __process_pages_contig(inode->i_mapping, locked_page, index, - end_index, PAGE_LOCK, &index_ret); - if (ret == -EAGAIN) + ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start, + delalloc_end, PAGE_LOCK, &processed_end); + if (ret == -EAGAIN && processed_end > delalloc_start) __unlock_for_delalloc(inode, locked_page, delalloc_start, - (u64)index_ret << PAGE_SHIFT); + processed_end); return ret; } @@ -1941,12 +1941,14 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, - pgoff_t start_index, pgoff_t end_index, - unsigned long page_ops, pgoff_t *index_ret) + u64 start, u64 end, unsigned long page_ops, + u64 *processed_end) { + pgoff_t start_index = start >> PAGE_SHIFT; + pgoff_t end_index = end >> PAGE_SHIFT; + pgoff_t index = start_index; unsigned long nr_pages = end_index - start_index + 1; unsigned long pages_processed = 0; - pgoff_t index = start_index; struct page *pages[16]; unsigned ret; int err = 0; @@ -1954,17 +1956,19 @@ static int __process_pages_contig(struct address_space *mapping, if (page_ops & PAGE_LOCK) { ASSERT(page_ops == PAGE_LOCK); - ASSERT(index_ret && *index_ret == start_index); + ASSERT(processed_end && *processed_end == start); } if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) mapping_set_error(mapping, -EIO); while (nr_pages > 0) { - ret = find_get_pages_contig(mapping, index, + int found_pages; + + found_pages = find_get_pages_contig(mapping, index, min_t(unsigned long, nr_pages, ARRAY_SIZE(pages)), pages); - if (ret == 0) { + if (found_pages == 0) { /* * Only if we're going to lock these pages, * can we find nothing at @index. @@ -2007,13 +2011,27 @@ static int __process_pages_contig(struct address_space *mapping, put_page(pages[i]); pages_processed++; } - nr_pages -= ret; - index += ret; + nr_pages -= found_pages; + index += found_pages; cond_resched(); } out: - if (err && index_ret) - *index_ret = start_index + pages_processed - 1; + if (err && processed_end) { + /* + * Update @processed_end. I know this is awful since it has + * two different return value patterns (inclusive vs exclusive). + * + * But the exclusive pattern is necessary if @start is 0, or we + * underflow and check against processed_end won't work as + * expected. + */ + if (pages_processed) + *processed_end = min(end, + ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); + else + *processed_end = start; + + } return err; } @@ -2024,8 +2042,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL); __process_pages_contig(inode->vfs_inode.i_mapping, locked_page, - start >> PAGE_SHIFT, end >> PAGE_SHIFT, - page_ops, NULL); + start, end, page_ops, NULL); } /* From patchwork Fri May 21 06:40:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7936AC433ED for ; Fri, 21 May 2021 06:42:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50C3D6135C for ; Fri, 21 May 2021 06:42:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234565AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:57736 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234467AbhEUGm1 (ORCPT ); Fri, 21 May 2021 02:42:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579258; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QATExfCUf3J+e5cZIe2WK2lef4CickjkZYd9of5Qq4A=; b=dlj++btk5jtfoyfOaHzA+R5oWj502rIZWrKGLnwbJbLpS2p9AcKpBTffCJde86iCz5K3mx xm9OH2SkaXll0U4rVjRRUiyI+qqanpq3Z+3WJ0OK//Cv3Aj0jVIvNMXAA13BAO/BX2fUX4 1cVnCBKnlbpvhDVMNIjao/Dq7X48Vk8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1E4FFAC1A; Fri, 21 May 2021 06:40:58 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 02/31] btrfs: refactor the page status update into process_one_page() Date: Fri, 21 May 2021 14:40:21 +0800 Message-Id: <20210521064050.191164-3-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In __process_pages_contig() we update page status according to page_ops. That update process is a bunch of if () {} branches, which lies inside two loops, this makes it pretty hard to expand for later subpage operations. So this patch will extract this operations into its own function, process_one_pages(). Also since we're refactoring __process_pages_contig(), also move the new helper and __process_pages_contig() before the first caller of them, to remove the forward declaration. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 206 +++++++++++++++++++++++-------------------- 1 file changed, 109 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c2914f851692..aa112a51b0d6 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1808,10 +1808,118 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, return found; } +/* + * Process one page for __process_pages_contig(). + * + * Return >0 if we hit @page == @locked_page. + * Return 0 if we updated the page status. + * Return -EGAIN if the we need to try again. + * (For PAGE_LOCK case but got dirty page or page not belong to mapping) + */ +static int process_one_page(struct address_space *mapping, + struct page *page, struct page *locked_page, + unsigned long page_ops) +{ + if (page_ops & PAGE_SET_ORDERED) + SetPageOrdered(page); + + if (page == locked_page) + return 1; + + if (page_ops & PAGE_SET_ERROR) + SetPageError(page); + if (page_ops & PAGE_START_WRITEBACK) { + clear_page_dirty_for_io(page); + set_page_writeback(page); + } + if (page_ops & PAGE_END_WRITEBACK) + end_page_writeback(page); + if (page_ops & PAGE_LOCK) { + lock_page(page); + if (!PageDirty(page) || page->mapping != mapping) { + unlock_page(page); + return -EAGAIN; + } + } + if (page_ops & PAGE_UNLOCK) + unlock_page(page); + return 0; +} + static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, u64 start, u64 end, unsigned long page_ops, - u64 *processed_end); + u64 *processed_end) +{ + pgoff_t start_index = start >> PAGE_SHIFT; + pgoff_t end_index = end >> PAGE_SHIFT; + pgoff_t index = start_index; + unsigned long nr_pages = end_index - start_index + 1; + unsigned long pages_processed = 0; + struct page *pages[16]; + int err = 0; + int i; + + if (page_ops & PAGE_LOCK) { + ASSERT(page_ops == PAGE_LOCK); + ASSERT(processed_end && *processed_end == start); + } + + if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) + mapping_set_error(mapping, -EIO); + + while (nr_pages > 0) { + int found_pages; + + found_pages = find_get_pages_contig(mapping, index, + min_t(unsigned long, + nr_pages, ARRAY_SIZE(pages)), pages); + if (found_pages == 0) { + /* + * Only if we're going to lock these pages, + * can we find nothing at @index. + */ + ASSERT(page_ops & PAGE_LOCK); + err = -EAGAIN; + goto out; + } + + for (i = 0; i < found_pages; i++) { + int process_ret; + + process_ret = process_one_page(mapping, pages[i], + locked_page, page_ops); + if (process_ret < 0) { + for (; i < found_pages; i++) + put_page(pages[i]); + err = -EAGAIN; + goto out; + } + put_page(pages[i]); + pages_processed++; + } + nr_pages -= found_pages; + index += found_pages; + cond_resched(); + } +out: + if (err && processed_end) { + /* + * Update @processed_end. I know this is awful since it has + * two different return value patterns (inclusive vs exclusive). + * + * But the exclusive pattern is necessary if @start is 0, or we + * underflow and check against processed_end won't work as + * expected. + */ + if (pages_processed) + *processed_end = min(end, + ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); + else + *processed_end = start; + } + return err; +} static noinline void __unlock_for_delalloc(struct inode *inode, struct page *locked_page, @@ -1939,102 +2047,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, return found; } -static int __process_pages_contig(struct address_space *mapping, - struct page *locked_page, - u64 start, u64 end, unsigned long page_ops, - u64 *processed_end) -{ - pgoff_t start_index = start >> PAGE_SHIFT; - pgoff_t end_index = end >> PAGE_SHIFT; - pgoff_t index = start_index; - unsigned long nr_pages = end_index - start_index + 1; - unsigned long pages_processed = 0; - struct page *pages[16]; - unsigned ret; - int err = 0; - int i; - - if (page_ops & PAGE_LOCK) { - ASSERT(page_ops == PAGE_LOCK); - ASSERT(processed_end && *processed_end == start); - } - - if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) - mapping_set_error(mapping, -EIO); - - while (nr_pages > 0) { - int found_pages; - - found_pages = find_get_pages_contig(mapping, index, - min_t(unsigned long, - nr_pages, ARRAY_SIZE(pages)), pages); - if (found_pages == 0) { - /* - * Only if we're going to lock these pages, - * can we find nothing at @index. - */ - ASSERT(page_ops & PAGE_LOCK); - err = -EAGAIN; - goto out; - } - - for (i = 0; i < ret; i++) { - if (page_ops & PAGE_SET_ORDERED) - SetPageOrdered(pages[i]); - - if (locked_page && pages[i] == locked_page) { - put_page(pages[i]); - pages_processed++; - continue; - } - if (page_ops & PAGE_START_WRITEBACK) { - clear_page_dirty_for_io(pages[i]); - set_page_writeback(pages[i]); - } - if (page_ops & PAGE_SET_ERROR) - SetPageError(pages[i]); - if (page_ops & PAGE_END_WRITEBACK) - end_page_writeback(pages[i]); - if (page_ops & PAGE_UNLOCK) - unlock_page(pages[i]); - if (page_ops & PAGE_LOCK) { - lock_page(pages[i]); - if (!PageDirty(pages[i]) || - pages[i]->mapping != mapping) { - unlock_page(pages[i]); - for (; i < ret; i++) - put_page(pages[i]); - err = -EAGAIN; - goto out; - } - } - put_page(pages[i]); - pages_processed++; - } - nr_pages -= found_pages; - index += found_pages; - cond_resched(); - } -out: - if (err && processed_end) { - /* - * Update @processed_end. I know this is awful since it has - * two different return value patterns (inclusive vs exclusive). - * - * But the exclusive pattern is necessary if @start is 0, or we - * underflow and check against processed_end won't work as - * expected. - */ - if (pages_processed) - *processed_end = min(end, - ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); - else - *processed_end = start; - - } - return err; -} - void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct page *locked_page, u32 clear_bits, unsigned long page_ops) From patchwork Fri May 21 06:40:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8516CC43460 for ; Fri, 21 May 2021 06:42:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6457A6135C for ; Fri, 21 May 2021 06:42:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232524AbhEUGn4 (ORCPT ); Fri, 21 May 2021 02:43:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:57748 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234534AbhEUGm2 (ORCPT ); Fri, 21 May 2021 02:42:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579260; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JP9j8HnUfIRsWheW/HlFksCo1hCsZvfEoApzqE6MwyY=; b=jBqw9iigFBMjwUKq7PbRVuTOEWHtDdI8zlSfBjdMNonpD6hiT2YOqkqMjEgWNxx6h1iXKZ GjDsd6bOFQ7peidlzHPd7AFcsDOUQpGko6SweKM7QGi4sxf5F+71/OUcj/Pre7gv3ggYVE DojHbFMbpomWRqbCRoiPpJ8Q/hciETk= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 13512AC47; Fri, 21 May 2021 06:41:00 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 03/31] btrfs: provide btrfs_page_clamp_*() helpers Date: Fri, 21 May 2021 14:40:22 +0800 Message-Id: <20210521064050.191164-4-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In the coming subpage RW supports, there are a lot of page status update calls which need to be converted to subpage compatible version, which needs @start and @len. Some call sites already have such @start/@len and are already in page range, like various endio functions. But there are also call sites which need to clamp the range for subpage case, like btrfs_dirty_pagse() and __process_contig_pages(). Here we introduce new helpers, btrfs_page_clamp_*(), to do and only do the clamp for subpage version. Although in theory all existing btrfs_page_*() calls can be converted to use btrfs_page_clamp_*() directly, but that would make us to do unnecessary clamp operations. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/subpage.c | 38 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 10 ++++++++++ 2 files changed, 48 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 2d19089ab625..a6cf1776f3f9 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -354,6 +354,16 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } +static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) +{ + u64 orig_start = *start; + u32 orig_len = *len; + + *start = max_t(u64, page_offset(page), orig_start); + *len = min_t(u64, page_offset(page) + PAGE_SIZE, + orig_start + orig_len) - *start; +} + /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. @@ -408,6 +418,34 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info, \ if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) \ return test_page_func(page); \ return btrfs_subpage_test_##name(fs_info, page, start, len); \ +} \ +void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { \ + set_page_func(page); \ + return; \ + } \ + btrfs_subpage_clamp_range(page, &start, &len); \ + btrfs_subpage_set_##name(fs_info, page, start, len); \ +} \ +void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { \ + clear_page_func(page); \ + return; \ + } \ + btrfs_subpage_clamp_range(page, &start, &len); \ + btrfs_subpage_clear_##name(fs_info, page, start, len); \ +} \ +bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) \ + return test_page_func(page); \ + btrfs_subpage_clamp_range(page, &start, &len); \ + return btrfs_subpage_test_##name(fs_info, page, start, len); \ } IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate, PageUptodate); diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index bfd626e955be..291cb1932f27 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -72,6 +72,10 @@ void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, * btrfs_page_*() are for call sites where the page can either be subpage * specific or regular page. The function will handle both cases. * But the range still needs to be inside the page. + * + * btrfs_page_clamp_*() are similar to btrfs_page_*(), except the range doesn't + * need to be inside the page. Those functions will truncate the range + * automatically. */ #define DECLARE_BTRFS_SUBPAGE_OPS(name) \ void btrfs_subpage_set_##name(const struct btrfs_fs_info *fs_info, \ @@ -85,6 +89,12 @@ void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info, \ void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info, \ struct page *page, u64 start, u32 len); \ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info, \ struct page *page, u64 start, u32 len); DECLARE_BTRFS_SUBPAGE_OPS(uptodate); From patchwork Fri May 21 06:40:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2917C433B4 for ; Fri, 21 May 2021 06:42:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C49DA611AD for ; Fri, 21 May 2021 06:42:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231496AbhEUGn7 (ORCPT ); Fri, 21 May 2021 02:43:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:57772 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233472AbhEUGma (ORCPT ); Fri, 21 May 2021 02:42:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579261; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sDzNsfAICvIDuRxUiWMT4bomDdzAtlDXzTUlb7hFiwA=; b=OfIo8hP7Sml99tWL28FspirroCiLn4RhEx224fQ+pe+H8AJtCS+HxNcfaQr/v749l2j3cq nPMxHwcBcgIauWQKF2aT+maeeAlEmLUMX406sR+eS9tZu+Se1QS18zkL1epNdb0+pvSMfo 2KLNKWhXU7FsNryHDlOYnb+Wn7FUkd0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BA8D5AC46 for ; Fri, 21 May 2021 06:41:01 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 04/31] btrfs: only require sector size alignment for end_bio_extent_writepage() Date: Fri, 21 May 2021 14:40:23 +0800 Message-Id: <20210521064050.191164-5-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just like read page, for subpage support we only require sector size alignment. So change the error message condition to only require sector alignment. This should not affect existing code, as for regular sectorsize == PAGE_SIZE case, we are still requiring page alignment. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index aa112a51b0d6..28ebaa28dce2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2824,25 +2824,20 @@ static void end_bio_extent_writepage(struct bio *bio) struct page *page = bvec->bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + const u32 sectorsize = fs_info->sectorsize; - /* We always issue full-page reads, but if some block - * in a page fails to read, blk_update_request() will - * advance bv_offset and adjust bv_len to compensate. - * Print a warning for nonzero offsets, and an error - * if they don't add up to a full page. */ - if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) { - if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE) - btrfs_err(fs_info, - "partial page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else - btrfs_info(fs_info, - "incomplete page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - } + /* Btrfs read write should always be sector aligned. */ + if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + btrfs_err(fs_info, + "partial page write in btrfs with offset %u and length %u", + bvec->bv_offset, bvec->bv_len); + else if (!IS_ALIGNED(bvec->bv_len, sectorsize)) + btrfs_info(fs_info, + "incomplete page write with offset %u and length %u", + bvec->bv_offset, bvec->bv_len); - start = page_offset(page); - end = start + bvec->bv_offset + bvec->bv_len - 1; + start = page_offset(page) + bvec->bv_offset; + end = start + bvec->bv_len - 1; if (first_bvec) { btrfs_record_physical_zoned(inode, start, bio); From patchwork Fri May 21 06:40:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B57FC433B4 for ; Fri, 21 May 2021 06:42:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 32E85611AD for ; Fri, 21 May 2021 06:42:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231911AbhEUGoS (ORCPT ); Fri, 21 May 2021 02:44:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:58774 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234422AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579263; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYVeus6owo8iEUG8RcD8NEjpyF6B+CbULJO+ll+Mzhw=; b=OAC85h1dQVgXytV4taq+/1UvM0t/m/9zUDueozU66vkcZbzq7MJ+dcqZGKk4JjstQ0STS3 pLKWgMMLwO8s9YRdC9s6s0IArQlcNy9c37vMui8HgkPUMnIiYMSun5u0wsdogUfKXSnpN9 qMQRdRzvVyMjZJAE043vppKe45s56Co= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 952D0ACAD; Fri, 21 May 2021 06:41:03 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 05/31] btrfs: make btrfs_dirty_pages() to be subpage compatible Date: Fri, 21 May 2021 14:40:24 +0800 Message-Id: <20210521064050.191164-6-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since the extent io tree operations in btrfs_dirty_pages() are already subpage compatible, we only need to make the page status update to use subpage helpers. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 3b10d98b4ebb..4a89697ae3a7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -28,6 +28,7 @@ #include "compression.h" #include "delalloc-space.h" #include "reflink.h" +#include "subpage.h" static struct kmem_cache *btrfs_inode_defrag_cachep; /* @@ -482,6 +483,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, start_pos = round_down(pos, fs_info->sectorsize); num_bytes = round_up(write_bytes + pos - start_pos, fs_info->sectorsize); + ASSERT(num_bytes <= U32_MAX); end_of_last_block = start_pos + num_bytes - 1; @@ -500,9 +502,10 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, for (i = 0; i < num_pages; i++) { struct page *p = pages[i]; - SetPageUptodate(p); + + btrfs_page_clamp_set_uptodate(fs_info, p, start_pos, num_bytes); ClearPageChecked(p); - set_page_dirty(p); + btrfs_page_clamp_set_dirty(fs_info, p, start_pos, num_bytes); } /* From patchwork Fri May 21 06:40:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0C89C433ED for ; Fri, 21 May 2021 06:42:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B2A49611AB for ; Fri, 21 May 2021 06:42:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233526AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:58776 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234399AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579265; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2+Ws1LKbCX9gsU8in5gRwiw5EmHLgr/6xGqCCB14mJs=; b=IOx3Rf4W2/JrOG0+MHCFfokUxEvSIP0UMyWzQbklRQGZ5Qou50SzTRRaGJJEMJZuuZ3duB rbwW1Zhr3Z3wt+bV+xezUMzrbODZXwvvrriey9O+PHCcNvsBznSOyE0hHxFwmpNw/u2UBH y72SpweDl8ZK2kIKkXaL8Kklc+/eRT8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 94EB4AC8F; Fri, 21 May 2021 06:41:05 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 06/31] btrfs: make __process_pages_contig() to handle subpage dirty/error/writeback status Date: Fri, 21 May 2021 14:40:25 +0800 Message-Id: <20210521064050.191164-7-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For __process_pages_contig() and process_one_page(), to handle subpage we only need to pass bytenr in and call subpage helpers to handle dirty/error/writeback status. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 28ebaa28dce2..30bbd1b08c8d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1816,10 +1816,16 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, * Return -EGAIN if the we need to try again. * (For PAGE_LOCK case but got dirty page or page not belong to mapping) */ -static int process_one_page(struct address_space *mapping, +static int process_one_page(struct btrfs_fs_info *fs_info, + struct address_space *mapping, struct page *page, struct page *locked_page, - unsigned long page_ops) + unsigned long page_ops, u64 start, u64 end) { + u32 len; + + ASSERT(end + 1 - start != 0 && end + 1 - start < U32_MAX); + len = end + 1 - start; + if (page_ops & PAGE_SET_ORDERED) SetPageOrdered(page); @@ -1827,13 +1833,13 @@ static int process_one_page(struct address_space *mapping, return 1; if (page_ops & PAGE_SET_ERROR) - SetPageError(page); + btrfs_page_clamp_set_error(fs_info, page, start, len); if (page_ops & PAGE_START_WRITEBACK) { - clear_page_dirty_for_io(page); - set_page_writeback(page); + btrfs_page_clamp_clear_dirty(fs_info, page, start, len); + btrfs_page_clamp_set_writeback(fs_info, page, start, len); } if (page_ops & PAGE_END_WRITEBACK) - end_page_writeback(page); + btrfs_page_clamp_clear_writeback(fs_info, page, start, len); if (page_ops & PAGE_LOCK) { lock_page(page); if (!PageDirty(page) || page->mapping != mapping) { @@ -1851,6 +1857,7 @@ static int __process_pages_contig(struct address_space *mapping, u64 start, u64 end, unsigned long page_ops, u64 *processed_end) { + struct btrfs_fs_info *fs_info = btrfs_sb(mapping->host->i_sb); pgoff_t start_index = start >> PAGE_SHIFT; pgoff_t end_index = end >> PAGE_SHIFT; pgoff_t index = start_index; @@ -1887,8 +1894,9 @@ static int __process_pages_contig(struct address_space *mapping, for (i = 0; i < found_pages; i++) { int process_ret; - process_ret = process_one_page(mapping, pages[i], - locked_page, page_ops); + process_ret = process_one_page(fs_info, mapping, + pages[i], locked_page, page_ops, + start, end); if (process_ret < 0) { for (; i < found_pages; i++) put_page(pages[i]); From patchwork Fri May 21 06:40:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E07FC433ED for ; Fri, 21 May 2021 06:43:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4FF7611AB for ; Fri, 21 May 2021 06:42:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233317AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:58778 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234441AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579267; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5L+CbCEacPDaeqbln1oS/6uiCaNyI132WrCSbJfyAg0=; b=Xe8xiv3fe/GITfXoATcRXzc3be6BK7zojcY3A0c+KxoVJO0e2XHZkLIDwB1uf94kFGyOF8 DUoJI5kD8KgZxA7eYH71mrYeMrUzGywDd8aJ1cdXWj8sNEixrLAU4kBno5syBE1W9bFSyP yz/IcxGumEI6VrEg5MLBaKwqftK7XJY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AFC52ACF4; Fri, 21 May 2021 06:41:07 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 07/31] btrfs: make end_bio_extent_writepage() to be subpage compatible Date: Fri, 21 May 2021 14:40:26 +0800 Message-Id: <20210521064050.191164-8-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now in end_bio_extent_writepage(), the only subpage incompatible code is the end_page_writeback(). Just call the subpage helpers. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 30bbd1b08c8d..d16c84430981 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2853,7 +2853,8 @@ static void end_bio_extent_writepage(struct bio *bio) } end_extent_writepage(page, error, start, end); - end_page_writeback(page); + + btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len); } bio_put(bio); From patchwork Fri May 21 06:40:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F45CC43461 for ; Fri, 21 May 2021 06:43:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E933F611AD for ; Fri, 21 May 2021 06:43:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233918AbhEUGoW (ORCPT ); Fri, 21 May 2021 02:44:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:58780 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234460AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579269; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LW9162Hb7N+LQrk3Wn0CRO2u7aDsjriuBFTN+vqhc4A=; b=fUjD3XfKIFoUcgj3Tmn71yD5qxzMq6Dh+syaSnZAliPIpJ3WQkSh90zkgcqH8kBysXpLQR 6fAsb7LY9iCZrAgeJ8upyF1hiCYGnL4uCnE62uEF6dIPPXIdFkz7LrgyG5+7yeqlPetYC9 9Zec9xOTQNOp0tc0EU9aAzXGMWDtc7k= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A36AFAD0B; Fri, 21 May 2021 06:41:09 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v3 08/31] btrfs: make process_one_page() to handle subpage locking Date: Fri, 21 May 2021 14:40:27 +0800 Message-Id: <20210521064050.191164-9-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce a new data inodes specific subpage member, writers, to record how many sectors are under page lock for delalloc writing. This member acts pretty much the same as readers, except it's only for delalloc writes. This is important for delalloc code to trace which page can really be freed, as we have cases like run_delalloc_nocow() where we may exit processing nocow range inside a page, but need to exit to do cow half way. In that case, we need a way to determine if we can really unlock a full page. With the new btrfs_subpage::writers, there is a new requirement: - Page locked by process_one_page() must be unlocked by process_one_page() There are still tons of call sites manually lock and unlock a page, without updating btrfs_subpage::writers. So if we lock a page through process_one_page() then it must be unlocked by process_one_page() to keep btrfs_subpage::writers consistent. This will be handled in next patch. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 10 +++-- fs/btrfs/subpage.c | 89 ++++++++++++++++++++++++++++++++++++++------ fs/btrfs/subpage.h | 10 +++++ 3 files changed, 94 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d16c84430981..8f32c2e64936 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1841,14 +1841,18 @@ static int process_one_page(struct btrfs_fs_info *fs_info, if (page_ops & PAGE_END_WRITEBACK) btrfs_page_clamp_clear_writeback(fs_info, page, start, len); if (page_ops & PAGE_LOCK) { - lock_page(page); + int ret; + + ret = btrfs_page_start_writer_lock(fs_info, page, start, len); + if (ret) + return ret; if (!PageDirty(page) || page->mapping != mapping) { - unlock_page(page); + btrfs_page_end_writer_lock(fs_info, page, start, len); return -EAGAIN; } } if (page_ops & PAGE_UNLOCK) - unlock_page(page); + btrfs_page_end_writer_lock(fs_info, page, start, len); return 0; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index a6cf1776f3f9..f728e5009487 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -110,10 +110,12 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info, if (!*ret) return -ENOMEM; spin_lock_init(&(*ret)->lock); - if (type == BTRFS_SUBPAGE_METADATA) + if (type == BTRFS_SUBPAGE_METADATA) { atomic_set(&(*ret)->eb_refs, 0); - else + } else { atomic_set(&(*ret)->readers, 0); + atomic_set(&(*ret)->writers, 0); + } return 0; } @@ -203,6 +205,79 @@ void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, unlock_page(page); } +static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) +{ + u64 orig_start = *start; + u32 orig_len = *len; + + *start = max_t(u64, page_offset(page), orig_start); + *len = min_t(u64, page_offset(page) + PAGE_SIZE, + orig_start + orig_len) - *start; +} + +void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + int nbits = len >> fs_info->sectorsize_bits; + int ret; + + btrfs_subpage_assert(fs_info, page, start, len); + + ASSERT(atomic_read(&subpage->readers) == 0); + ret = atomic_add_return(nbits, &subpage->writers); + ASSERT(ret == nbits); +} + +bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + int nbits = len >> fs_info->sectorsize_bits; + + btrfs_subpage_assert(fs_info, page, start, len); + + ASSERT(atomic_read(&subpage->writers) >= nbits); + return atomic_sub_and_test(nbits, &subpage->writers); +} + +/* + * To lock a page for delalloc page writeback. + * + * Return -EAGAIN if the page is not properly initialized. + * Return 0 with the page locked, and writer counter updated. + * + * Even with 0 returned, the page still need extra check to make sure + * it's really the correct page, as the caller is using + * find_get_pages_contig(), which can race with page invalidating. + */ +int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { + lock_page(page); + return 0; + } + lock_page(page); + if (!PagePrivate(page) || !page->private) { + unlock_page(page); + return -EAGAIN; + } + btrfs_subpage_clamp_range(page, &start, &len); + btrfs_subpage_start_writer(fs_info, page, start, len); + return 0; +} + +void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) + return unlock_page(page); + btrfs_subpage_clamp_range(page, &start, &len); + if (btrfs_subpage_end_and_test_writer(fs_info, page, start, len)) + unlock_page(page); +} + /* * Convert the [start, start + len) range into a u16 bitmap * @@ -354,16 +429,6 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } -static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) -{ - u64 orig_start = *start; - u32 orig_len = *len; - - *start = max_t(u64, page_offset(page), orig_start); - *len = min_t(u64, page_offset(page) + PAGE_SIZE, - orig_start + orig_len) - *start; -} - /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 291cb1932f27..9d087ab3244e 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -33,6 +33,7 @@ struct btrfs_subpage { /* Structures only used by data */ struct { atomic_t readers; + atomic_t writers; }; }; }; @@ -63,6 +64,15 @@ void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); +void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); + /* * Template for subpage related operations. * From patchwork Fri May 21 06:40:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3B30C43460 for ; Fri, 21 May 2021 06:43:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0192611AB for ; Fri, 21 May 2021 06:43:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232234AbhEUGoW (ORCPT ); Fri, 21 May 2021 02:44:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:58782 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234749AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579271; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SDn4VMTjdSk8A06e4ZuQlZmj6wRQRHg6TgqgNAVFEgU=; b=QsyIKZiwIKNnHAngwFy8F83zhER5E9bTezdhjxOG39aq/BxUjeegPYqaXl2TL0iFQ06O6I APMvtQ3kWRWVzjcwCGMmt5RXQQdBzDK7fyq7/cmeNtjwvognggOhnxFFwZrt4COib81gBw a8OyAwp1TIDhMWFT5ZwtQ1Epe/Xj56U= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 58577ACF2 for ; Fri, 21 May 2021 06:41:11 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 09/31] btrfs: introduce helpers for subpage ordered status Date: Fri, 21 May 2021 14:40:28 +0800 Message-Id: <20210521064050.191164-10-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introduces the following functions to handle btrfs subpage ordered (private2) status: - btrfs_subpage_set_ordered() - btrfs_subpage_clear_ordered() - btrfs_subpage_test_ordered() Those helpers can only be called when the range is ensured to be inside the page. - btrfs_page_set_ordered() - btrfs_page_clear_ordered() - btrfs_page_test_ordered() Those helpers can handle both regular sector size and subpage without problem. Those functions are here to coordinate btrfs_invalidatepage() with btrfs_writepage_endio_finish_ordered(), to make sure only one of those functions can finish the ordered extent. Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 4 ++++ 2 files changed, 33 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index f728e5009487..516e0b3f2ed9 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -429,6 +429,32 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } +void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + subpage->ordered_bitmap |= tmp; + SetPageOrdered(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} + +void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + subpage->ordered_bitmap &= ~tmp; + if (subpage->ordered_bitmap == 0) + ClearPageOrdered(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. @@ -451,6 +477,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback); +IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered); /* * Note that, in selftests (extent-io-tests), we can have empty fs_info passed @@ -519,3 +546,5 @@ IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io, PageDirty); IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback, PageWriteback); +IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered, + PageOrdered); diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 9d087ab3244e..3419b152c00f 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -34,6 +34,9 @@ struct btrfs_subpage { struct { atomic_t readers; atomic_t writers; + + /* If a sector has pending ordered extent */ + u16 ordered_bitmap; }; }; }; @@ -111,6 +114,7 @@ DECLARE_BTRFS_SUBPAGE_OPS(uptodate); DECLARE_BTRFS_SUBPAGE_OPS(error); DECLARE_BTRFS_SUBPAGE_OPS(dirty); DECLARE_BTRFS_SUBPAGE_OPS(writeback); +DECLARE_BTRFS_SUBPAGE_OPS(ordered); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); From patchwork Fri May 21 06:40:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9283DC433B4 for ; Fri, 21 May 2021 06:43:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 796A0611AD for ; Fri, 21 May 2021 06:43:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233394AbhEUGoV (ORCPT ); Fri, 21 May 2021 02:44:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:58784 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231924AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jcBMRyy175QVKM/4mflTvgb4PX/FFU5suS/GcvnKzjA=; b=DvnFH7t7pfbfcWmN1DJSJ4j+pPm7OphppPPIh/Jamm/GSYz03rAf3iSgFWYhJvZhdAfWu8 8XDm+HUfj/PqMHBp07CDysy5jR0Gz4gBzL9EEI2qPKQSXqozySpgr59e1VLaaOA/BLa6N6 MZO2rWAG3B92zKqQ5Uzwt/0rOgstW90= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 03EDAAD10 for ; Fri, 21 May 2021 06:41:13 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 10/31] btrfs: make page Ordered bit to be subpage compatible Date: Fri, 21 May 2021 14:40:29 +0800 Message-Id: <20210521064050.191164-11-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This involves the following modication: - Ordered extent creation This is done in process_one_page(), now PAGE_SET_ORDERED will call subpage helper to do the work. - endio functions This is done in btrfs_mark_ordered_io_finished(). - btrfs_invalidatepage() - btrfs_cleanup_ordered_extents() Use the subpage page helper, and add an extra branch to exit if the locked page have covered the full range. Now the usage of page Ordered flag for ordered extent accounting is fully subpage compatible. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 19 ++++++++++++++----- fs/btrfs/ordered-data.c | 5 +++-- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f32c2e64936..4ee0bd03c0d1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1827,7 +1827,7 @@ static int process_one_page(struct btrfs_fs_info *fs_info, len = end + 1 - start; if (page_ops & PAGE_SET_ORDERED) - SetPageOrdered(page); + btrfs_page_clamp_set_ordered(fs_info, page, start, len); if (page == locked_page) return 1; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 83b014c40d42..a18881c7c916 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -51,6 +51,7 @@ #include "block-group.h" #include "space-info.h" #include "zoned.h" +#include "subpage.h" struct btrfs_iget_args { u64 ino; @@ -191,18 +192,22 @@ static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode, * range, then __endio_write_update_ordered() will handle * the ordered extent accounting for the range. */ - ClearPageOrdered(page); + btrfs_page_clamp_clear_ordered(inode->root->fs_info, page, + offset, bytes); put_page(page); } + /* The locked page covers the full range, nothing needs to be done */ + if (bytes + offset <= page_offset(locked_page) + PAGE_SIZE) + return; /* * In case this page belongs to the delalloc range being instantiated * then skip it, since the first page of a range is going to be * properly cleaned up by the caller of run_delalloc_range */ if (page_start >= offset && page_end <= (offset + bytes - 1)) { - offset += PAGE_SIZE; - bytes -= PAGE_SIZE; + bytes = offset + bytes - page_offset(locked_page) - PAGE_SIZE; + offset = page_offset(locked_page) + PAGE_SIZE; } return __endio_write_update_ordered(inode, offset, bytes, false); @@ -8308,6 +8313,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length) { struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct extent_io_tree *tree = &inode->io_tree; struct extent_state *cached_state = NULL; u64 page_start = page_offset(page); @@ -8343,6 +8349,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, struct btrfs_ordered_extent *ordered; bool delete_states; u64 range_end; + u32 range_len; ordered = btrfs_lookup_first_ordered_range(inode, cur, page_end + 1 - cur); @@ -8369,7 +8376,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, range_end = min(ordered->file_offset + ordered->num_bytes - 1, page_end); - if (!PageOrdered(page)) { + ASSERT(range_end + 1 - cur < U32_MAX); + range_len = range_end + 1 - cur; + if (!btrfs_page_test_ordered(fs_info, page, cur, range_len)) { /* * If Ordered (Private2) is cleared, it means endio has * already been executed for the range. @@ -8379,7 +8388,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, delete_states = false; goto next; } - ClearPageOrdered(page); + btrfs_page_clear_ordered(fs_info, page, cur, range_len); /* * IO on this page will never be started, so we need to account diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index b1b377ad99a0..6eb41b7c0c84 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -16,6 +16,7 @@ #include "compression.h" #include "delalloc-space.h" #include "qgroup.h" +#include "subpage.h" static struct kmem_cache *btrfs_ordered_extent_cache; @@ -395,11 +396,11 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode, * * If there's no such bit, we need to skip to next range. */ - if (!PageOrdered(page)) { + if (!btrfs_page_test_ordered(fs_info, page, cur, len)) { cur += len; continue; } - ClearPageOrdered(page); + btrfs_page_clear_ordered(fs_info, page, cur, len); } /* Now we're fine to update the accounting */ From patchwork Fri May 21 06:40:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9516EC43461 for ; Fri, 21 May 2021 06:42:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BDAF611AB for ; Fri, 21 May 2021 06:42:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233135AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:58786 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234753AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579275; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GDhnZcsLM789/+Fxsdm7GdBOGyq33jlWaivp6KyBbm8=; b=HVMxU7f0wsuVZkJ8SSIdTmtFL/xwpuSED7GCB39r3FkazupPzrmLzVSU7OmMYGYi1C3xUt +/vK04uIg7O5+I8fIvPXmgDYeV5zajcswxHnNgC9c1n9+b8EwgbjatM8KOF2w4sLicmDrP B+yc3XJaXEMwQL97JHRs7qJmXT5PrZk= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 02FD4AD12; Fri, 21 May 2021 06:41:15 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH v3 11/31] btrfs: update locked page dirty/writeback/error bits in __process_pages_contig Date: Fri, 21 May 2021 14:40:30 +0800 Message-Id: <20210521064050.191164-12-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When __process_pages_contig() gets called for extent_clear_unlock_delalloc(), if we hit the locked page, only Private2 bit is updated, but dirty/writeback/error bits are all skipped. There are several call sites that call extent_clear_unlock_delalloc() with locked_page and PAGE_CLEAR_DIRTY/PAGE_SET_WRITEBACK/PAGE_END_WRITEBACK - cow_file_range() - run_delalloc_nocow() - cow_file_range_async() All for their error handling branches. For those call sites, since we skip the locked page for dirty/error/writeback bit update, the locked page will still have its subpage dirty bit remaining. Normally it's the call sites which locked the page to handle the locked page, but it won't hurt if we also do the update. Especially there are already other call sites doing the same thing by manually passing NULL as locked_page. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4ee0bd03c0d1..8c1f66a9a2c4 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1828,10 +1828,6 @@ static int process_one_page(struct btrfs_fs_info *fs_info, if (page_ops & PAGE_SET_ORDERED) btrfs_page_clamp_set_ordered(fs_info, page, start, len); - - if (page == locked_page) - return 1; - if (page_ops & PAGE_SET_ERROR) btrfs_page_clamp_set_error(fs_info, page, start, len); if (page_ops & PAGE_START_WRITEBACK) { @@ -1840,6 +1836,10 @@ static int process_one_page(struct btrfs_fs_info *fs_info, } if (page_ops & PAGE_END_WRITEBACK) btrfs_page_clamp_clear_writeback(fs_info, page, start, len); + + if (page == locked_page) + return 1; + if (page_ops & PAGE_LOCK) { int ret; From patchwork Fri May 21 06:40:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16361C433ED for ; Fri, 21 May 2021 06:43:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F07B4611AB for ; Fri, 21 May 2021 06:43:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233963AbhEUGoX (ORCPT ); Fri, 21 May 2021 02:44:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:58788 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234763AbhEUGny (ORCPT ); Fri, 21 May 2021 02:43:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579276; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j6wwqZ2NHQ98V9BxCk2fJD9t3QEV7jgtGjgySMsAl/o=; b=UJv9cl1r/+c7NHy+G9VBOs++Ylyb09tw6klKqSgVb/YPDUm2ccTSgD/o2TGRqqLmrPLEiP c5TBcsVKknGP2IhfsapQLOHPvVnMfMbSCLkEaduicLLwUJ/CvjSdpQT5avmQS+mpI7t8kV 7NvVfvtf1hwpIhrYs6+LkVaKr0eIF6Q= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AC1C6AD1B for ; Fri, 21 May 2021 06:41:16 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 12/31] btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig() Date: Fri, 21 May 2021 14:40:31 +0800 Message-Id: <20210521064050.191164-13-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In cow_file_range(), after we have succeeded creating an inline extent, we unlock the page with extent_clear_unlock_delalloc() by passing locked_page == NULL. For sectorsize == PAGE_SIZE case, this is just making the page lock and unlock harder to grab. But for incoming subpage case, it can be a big problem. For incoming subpage case, page locking have two entrace: - __process_pages_contig() In that case, we know exactly the range we want to lock (which only requires sector alignment). To handle the subpage requirement, we introduce btrfs_subpage::writers to page::private, and will update it in __process_pages_contig(). - Other directly lock/unlock_page() call sites Those won't touch btrfs_subpage::writers at all. This means, page locked by __process_pages_contig() can only be unlocked by __process_pages_contig(). Thankfully we already have the existing infrastructure in the form of @locked_page in various call sites. Unfortunately, extent_clear_unlock_delalloc() in cow_file_range() after creating an inline extent is the exception. It intentionally call extent_clear_unlock_delalloc() with locked_page == NULL, to also unlock current page (and clear its dirty/writeback bits). To co-operate with incoming subpage modifications, and make the page lock/unlock pair easier to understand, this patch will still call extent_clear_unlock_delalloc() with locked_page, and only unlock the page in __extent_writepage(). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a18881c7c916..119834aa3587 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1091,7 +1091,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * our outstanding extent for clearing delalloc for this * range. */ - extent_clear_unlock_delalloc(inode, start, end, NULL, + extent_clear_unlock_delalloc(inode, start, end, + locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | @@ -1099,6 +1100,19 @@ static noinline int cow_file_range(struct btrfs_inode *inode, *nr_written = *nr_written + (end - start + PAGE_SIZE) / PAGE_SIZE; *page_started = 1; + /* + * locked_page is locked by the caller of + * writepage_delalloc(), not locked by + * __process_pages_contig(). + * + * We can't let __process_pages_contig() to unlock it, + * as it doesn't have any subpage::writers recorded. + * + * Here we manually unlock the page, since the caller + * can't use page_started to determine if it's an + * inline extent or a compressed extent. + */ + unlock_page(locked_page); goto out; } else if (ret < 0) { goto out_unlock; From patchwork Fri May 21 06:40:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54D5AC433B4 for ; Fri, 21 May 2021 06:43:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 39FA9613B6 for ; Fri, 21 May 2021 06:43:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233983AbhEUGo2 (ORCPT ); Fri, 21 May 2021 02:44:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:58800 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234812AbhEUGn4 (ORCPT ); Fri, 21 May 2021 02:43:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579278; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DTHpyiBBLx/LQ6fe3v/RVlj7Z2A2alkUvIBshMeJ70k=; b=qKicklgSsj6lBbqHifZzG6JXZ4A88HwAQ8kNM/aSJq7b910mf8SPwsFlIdBoCSUJKdNZ5a lHjNc9+FNkRZ002IyunwqD2MYWj08F+Q4EK4ck1bF/ciFdNfz+j172W00M77h9oNEMPBtX OuGGLAVFQmrfCW1HelRq3I8IpApdops= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6814DAD22 for ; Fri, 21 May 2021 06:41:18 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 13/31] btrfs: make btrfs_set_range_writeback() subpage compatible Date: Fri, 21 May 2021 14:40:32 +0800 Message-Id: <20210521064050.191164-14-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Function btrfs_set_range_writeback() currently just set the page writeback unconditionally. Change it to call the subpage helper so that we can handle both cases well. Since the subpage helpers needs btrfs_info, also change the parameter to accept btrfs_inode. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 3 +-- fs/btrfs/inode.c | 12 ++++++++---- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 42048d051317..37e6b7334c26 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3146,7 +3146,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, unsigned long bio_flags); -void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end); +void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); void btrfs_evict_inode(struct inode *inode); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8c1f66a9a2c4..a635c844617c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3806,7 +3806,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, int *nr_ret) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct extent_io_tree *tree = &inode->io_tree; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; u64 cur = start; @@ -3885,7 +3884,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, continue; } - btrfs_set_range_writeback(tree, cur, cur + iosize - 1); + btrfs_set_range_writeback(inode, cur, cur + iosize - 1); if (!PageWriteback(page)) { btrfs_err(inode->root->fs_info, "page %lu not writeback, cur %llu end %llu", diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 119834aa3587..298088cdcecd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10188,17 +10188,21 @@ static int btrfs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir, return ret; } -void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) +void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) { - struct inode *inode = tree->private_data; + struct btrfs_fs_info *fs_info = inode->root->fs_info; unsigned long index = start >> PAGE_SHIFT; unsigned long end_index = end >> PAGE_SHIFT; struct page *page; + u32 len; + ASSERT(end + 1 - start <= U32_MAX); + len = end + 1 - start; while (index <= end_index) { - page = find_get_page(inode->i_mapping, index); + page = find_get_page(inode->vfs_inode.i_mapping, index); ASSERT(page); /* Pages should be in the extent_io_tree */ - set_page_writeback(page); + + btrfs_page_set_writeback(fs_info, page, start, len); put_page(page); index++; } From patchwork Fri May 21 06:40:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 330F5C433B4 for ; Fri, 21 May 2021 06:43:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1565E611AB for ; Fri, 21 May 2021 06:43:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233375AbhEUGoZ (ORCPT ); Fri, 21 May 2021 02:44:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:58802 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234796AbhEUGn4 (ORCPT ); Fri, 21 May 2021 02:43:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579280; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DpxEedIwdLwukFe+ay3XE+HAsPFU06P4A7o7WJoj5iM=; b=IBsW9rFMADBMKoosNLHfffsnOAb5PPJeaCaL8os4V9P6XiCYG2O2baxdSYnufyNWaWN9/L cYUklnIRIAt5EwdxbDTZ9H0J9WTfx8HRljYzL9ti8x+fXWOtxZk8yf6/jtb267WJ0a6MF9 1E+gEAhsXJUPqiNknPP4oEnsgwpJoqQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3AAC7AD5C for ; Fri, 21 May 2021 06:41:20 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 14/31] btrfs: make __extent_writepage_io() only submit dirty range for subpage Date: Fri, 21 May 2021 14:40:33 +0800 Message-Id: <20210521064050.191164-15-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org __extent_writepage_io() function originally just iterate through all the extent maps of a page, and submit any regular extents. This is fine for sectorsize == PAGE_SIZE case, as if a page is dirty, we need to submit the only sector contained in the page. But for subpage case, one dirty page can contain several clean sectors with at least one dirty sector. If __extent_writepage_io() still submit all regular extent maps, it can submit data which is already written to disk. And since such already written data won't have corresponding ordered extents, it will trigger a BUG_ON() in btrfs_csum_one_bio(). Change the behavior of __extent_writepage_io() by finding the first dirty byte in the page, and only submit the dirty range other than the full extent. Since we're also here, also modify the following calls to be subpage compatible: - SetPageError() - end_page_writeback() Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 100 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a635c844617c..bb5e9f73ad2b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3789,6 +3789,74 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, return 0; } +/* + * To find the first byte we need to write. + * + * For subpage, one page can contain several sectors, and + * __extent_writepage_io() will just grab all extent maps in the page + * range and try to submit all non-inline/non-compressed extents. + * + * This is a big problem for subpage, we shouldn't re-submit already written + * data at all. + * This function will lookup subpage dirty bit to find which range we really + * need to submit. + * + * Return the next dirty range in [@start, @end). + * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE. + */ +static void find_next_dirty_byte(struct btrfs_fs_info *fs_info, + struct page *page, u64 *start, u64 *end) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + u64 orig_start = *start; + u16 dirty_bitmap; + unsigned long flags; + int nbits = (orig_start - page_offset(page)) >> fs_info->sectorsize; + int first_bit_set; + int first_bit_zero; + + /* + * For regular sector size == page size case, since one page only + * contains one sector, we return the page offset directly. + */ + if (fs_info->sectorsize == PAGE_SIZE) { + *start = page_offset(page); + *end = page_offset(page) + PAGE_SIZE; + return; + } + + /* We should have the page locked, but just in case */ + spin_lock_irqsave(&subpage->lock, flags); + dirty_bitmap = subpage->dirty_bitmap; + spin_unlock_irqrestore(&subpage->lock, flags); + + /* Set bits lower than @nbits with 0 */ + dirty_bitmap &= ~((1 << nbits) - 1); + + first_bit_set = ffs(dirty_bitmap); + /* No dirty range found */ + if (first_bit_set == 0) { + *start = page_offset(page) + PAGE_SIZE; + return; + } + + ASSERT(first_bit_set > 0 && first_bit_set <= BTRFS_SUBPAGE_BITMAP_SIZE); + *start = page_offset(page) + (first_bit_set - 1) * fs_info->sectorsize; + + /* Set all bits lower than @nbits to 1 for ffz() */ + dirty_bitmap |= ((1 << nbits) - 1); + + first_bit_zero = ffz(dirty_bitmap); + if (first_bit_zero == 0 || first_bit_zero > BTRFS_SUBPAGE_BITMAP_SIZE) { + *end = page_offset(page) + PAGE_SIZE; + return; + } + ASSERT(first_bit_zero > 0 && + first_bit_zero <= BTRFS_SUBPAGE_BITMAP_SIZE); + *end = page_offset(page) + first_bit_zero * fs_info->sectorsize; + ASSERT(*end > *start); +} + /* * helper for __extent_writepage. This calls the writepage start hooks, * and does the loop to map the page into extents and bios. @@ -3836,6 +3904,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, while (cur <= end) { u64 disk_bytenr; u64 em_end; + u64 dirty_range_start = cur; + u64 dirty_range_end; u32 iosize; if (cur >= i_size) { @@ -3843,9 +3913,17 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, end, 1); break; } + + find_next_dirty_byte(fs_info, page, &dirty_range_start, + &dirty_range_end); + if (cur < dirty_range_start) { + cur = dirty_range_start; + continue; + } + em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1); if (IS_ERR_OR_NULL(em)) { - SetPageError(page); + btrfs_page_set_error(fs_info, page, cur, end - cur + 1); ret = PTR_ERR_OR_ZERO(em); break; } @@ -3860,8 +3938,11 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags); disk_bytenr = em->block_start + extent_offset; - /* Note that em_end from extent_map_end() is exclusive */ - iosize = min(em_end, end + 1) - cur; + /* + * Note that em_end from extent_map_end() and dirty_range_end from + * find_next_dirty_byte() are all exclusive + */ + iosize = min(min(em_end, end + 1), dirty_range_end) - cur; if (btrfs_use_zone_append(inode, em)) opf = REQ_OP_ZONE_APPEND; @@ -3891,6 +3972,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } + /* + * Although the PageDirty bit is cleared before entering this + * function, subpage dirty bit is not cleared. + * So clear subpage dirty bit here so next time we won't + * submit page for range already written to disk. + */ + btrfs_page_clear_dirty(fs_info, page, cur, iosize); + ret = submit_extent_page(opf | write_flags, wbc, &epd->bio_ctrl, page, disk_bytenr, iosize, @@ -3898,9 +3987,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, end_bio_extent_writepage, 0, 0, false); if (ret) { - SetPageError(page); + btrfs_page_set_error(fs_info, page, cur, iosize); if (PageWriteback(page)) - end_page_writeback(page); + btrfs_page_clear_writeback(fs_info, page, cur, + iosize); } cur += iosize; From patchwork Fri May 21 06:40:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD6EFC433ED for ; Fri, 21 May 2021 06:43:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B3EF36135C for ; Fri, 21 May 2021 06:43:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232870AbhEUGod (ORCPT ); Fri, 21 May 2021 02:44:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:58774 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232887AbhEUGoS (ORCPT ); Fri, 21 May 2021 02:44:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579282; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Coy9tq4scUmmVTjnu14bPNg1nmbhXDAvM+Rd8WO1AQI=; b=KQC8dDnD3szaUQUQic+apYSTpR5MAFsDYlBSUcw+vREFCcoejKY+DYeC/x08AigxpdPZth EHBTA2s8fr80VgTeTwpU4N/TMGgaLZLOsUyJ82DJdQmmJV7ZBRlQOVwNhUosAVZowsYucJ crEdnoMFjM4qXzKXjD5OmhVMYnGpKTo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id F0EC6AD6C for ; Fri, 21 May 2021 06:41:21 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 15/31] btrfs: make btrfs_truncate_block() to be subpage compatible Date: Fri, 21 May 2021 14:40:34 +0800 Message-Id: <20210521064050.191164-16-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org btrfs_truncate_block() itself is already mostly subpage compatible, the only missing part is the page dirtying code. Currently if we have a sector that needs to be truncated, we set the sector aligned range delalloc, then set the full page dirty. The problem is, current subpage code requires subpage dirty bit to be set, or __extent_writepage_io() won't submit bio, thus leads to ordered extent never to finish. So this patch will make btrfs_truncate_block() to call btrfs_page_set_dirty() helper to replace set_page_dirty() to fix the problem. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 298088cdcecd..5011f407c85c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4934,7 +4934,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, flush_dcache_page(page); } ClearPageChecked(page); - set_page_dirty(page); + btrfs_page_set_dirty(fs_info, page, block_start, + block_end + 1 - block_start); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); if (only_release_metadata) From patchwork Fri May 21 06:40:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B263FC433ED for ; Fri, 21 May 2021 06:43:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9864C611AD for ; Fri, 21 May 2021 06:43:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233332AbhEUGog (ORCPT ); Fri, 21 May 2021 02:44:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:59006 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233227AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579283; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+VhEooSYURuPgH/s7NuRUfjgR5o3bQkchmng0ZLW6hU=; b=ZUlVqtyK2uvywLM9omQmq7NS08htosZJYYLnuLtdhr/XEewI4hl11E0/JRo+XZHMsWhrvL BREqlswnZyYmp1ombUTQx3MEtwLWOT8R0wju1ip656+fHpT8YFi9a3hfItwwjK0pVdAUqg 6mqBOVG5+ZcV0p6K52PgSIa+hUTLAf8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A2D07AD71 for ; Fri, 21 May 2021 06:41:23 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 16/31] btrfs: make btrfs_page_mkwrite() to be subpage compatible Date: Fri, 21 May 2021 14:40:35 +0800 Message-Id: <20210521064050.191164-17-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Only set_page_dirty() and SetPageUptodate() is not subpage compatible. Convert them to subpage helpers, so that __extent_writepage_io() can submit page content correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5011f407c85c..39049a614b7c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8621,8 +8621,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) flush_dcache_page(page); } ClearPageChecked(page); - set_page_dirty(page); - SetPageUptodate(page); + btrfs_page_set_dirty(fs_info, page, page_start, end + 1 - page_start); + btrfs_page_set_uptodate(fs_info, page, page_start, + end + 1 - page_start); btrfs_set_inode_last_sub_trans(BTRFS_I(inode)); From patchwork Fri May 21 06:40:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5291AC433ED for ; Fri, 21 May 2021 06:43:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3BE166135C for ; Fri, 21 May 2021 06:43:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233674AbhEUGog (ORCPT ); Fri, 21 May 2021 02:44:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:59008 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233332AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579285; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ClZNYoBOMXtCrWCRQM9mHsvGsls0DNr+VpZ8ptGHrgY=; b=Pi0MVhpwtNmF4+4KEK31AH72Y8Hj8FX3iglde3vRvyoSjselzldq2//UFVWNGmcPg1lARL 9nUUPiDw2PZtTgyejf9hFUd71QPnE0A5KvmUDRBh+3c0eN/MciTiQ+m3CYujSDo1SVyqgo JaaO7Os+qdQ6/DyK+KChElou97Mq/q4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 57E12AD80 for ; Fri, 21 May 2021 06:41:25 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 17/31] btrfs: reflink: make copy_inline_to_page() to be subpage compatible Date: Fri, 21 May 2021 14:40:36 +0800 Message-Id: <20210521064050.191164-18-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The modifications are: - Page copy destination For subpage case, one page can contain multiple sectors, thus we can no longer expect the memcpy_to_page()/btrfs_decompress() to copy data into page offset 0. The correct offset is offset_in_page(file_offset) now, which should handle both regular sectorsize and subpage cases well. - Page status update Now we need to use subpage helper to handle the page status update. Signed-off-by: Qu Wenruo --- fs/btrfs/reflink.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index d434dc78dadf..a6e8b8e67c80 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -7,6 +7,7 @@ #include "delalloc-space.h" #include "reflink.h" #include "transaction.h" +#include "subpage.h" #define BTRFS_MAX_DEDUPE_LEN SZ_16M @@ -52,7 +53,8 @@ static int copy_inline_to_page(struct btrfs_inode *inode, const u64 datal, const u8 comp_type) { - const u64 block_size = btrfs_inode_sectorsize(inode); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + const u32 block_size = fs_info->sectorsize; const u64 range_end = file_offset + block_size - 1; const size_t inline_size = size - btrfs_file_extent_calc_inline_size(0); char *data_start = inline_data + btrfs_file_extent_calc_inline_size(0); @@ -106,10 +108,12 @@ static int copy_inline_to_page(struct btrfs_inode *inode, set_bit(BTRFS_INODE_NO_DELALLOC_FLUSH, &inode->runtime_flags); if (comp_type == BTRFS_COMPRESS_NONE) { - memcpy_to_page(page, 0, data_start, datal); + memcpy_to_page(page, offset_in_page(file_offset), data_start, + datal); flush_dcache_page(page); } else { - ret = btrfs_decompress(comp_type, data_start, page, 0, + ret = btrfs_decompress(comp_type, data_start, page, + offset_in_page(file_offset), inline_size, datal); if (ret) goto out_unlock; @@ -133,9 +137,9 @@ static int copy_inline_to_page(struct btrfs_inode *inode, flush_dcache_page(page); } - SetPageUptodate(page); + btrfs_page_set_uptodate(fs_info, page, file_offset, block_size); ClearPageChecked(page); - set_page_dirty(page); + btrfs_page_set_dirty(fs_info, page, file_offset, block_size); out_unlock: if (page) { unlock_page(page); From patchwork Fri May 21 06:40:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEB8CC433B4 for ; Fri, 21 May 2021 06:43:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4061611AD for ; Fri, 21 May 2021 06:43:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234236AbhEUGof (ORCPT ); Fri, 21 May 2021 02:44:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:58776 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233126AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579287; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wgfYSTk2/YHgwwGTP2QulYQgNhi2O01PKfNWXU3VuTI=; b=B4em99QITkqjI8n07zspsgotVuiKZu8btSScdy7r7EyRrWdpkLagzUpZSpJ1IFf2lxzY5n QrLm1I+8y55tmAVJzYd/GtpWEyfUpmNZfelPPDUrZNuSzM/fajAnUKtmwLz9g8iIX6sKaX PUWmueavTKa8QcKRHxL+5O5U5UZg7Hc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E8F00AE22 for ; Fri, 21 May 2021 06:41:26 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 18/31] btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range() Date: Fri, 21 May 2021 14:40:37 +0800 Message-Id: <20210521064050.191164-19-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] With current subpage RW support, the following script can hang the fs on with 64K page size. # mkfs.btrfs -f -s 4k $dev # mount $dev -o nospace_cache $mnt # fsstress -w -n 50 -p 1 -s 1607749395 -d $mnt The kernel will do an infinite loop in btrfs_punch_hole_lock_range(). [CAUSE] In btrfs_punch_hole_lock_range() we: - Truncate page cache range - Lock extent io tree - Wait any ordered extents in the range. We exit the loop until we meet all the following conditions: - No ordered extent in the lock range - No page is in the lock range The latter condition has a pitfall, it only works for sector size == PAGE_SIZE case. While can't handle the following subpage case: 0 32K 64K 96K 128K | |///////||//////| || lockstart=32K lockend=96K - 1 In this case, although the range cross 2 pages, truncate_pagecache_range() will invalidate no page at all, but only zero the [32K, 96K) range of the two pages. Thus filemap_range_has_page(32K, 96K-1) will always return true, thus we will never meet the loop exit condition. [FIX] Fix the problem by doing page alignment for the lock range. Function filemap_range_has_page() has already handled lend < lstart case, we only need to round up @lockstart, and round_down @lockend for truncate_pagecache_range(). This modification should not change any thing for sector size == PAGE_SIZE case, as in that case our range is already page aligned. Signed-off-by: Qu Wenruo --- fs/btrfs/file.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4a89697ae3a7..6ef44afa939c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2486,6 +2486,16 @@ static int btrfs_punch_hole_lock_range(struct inode *inode, const u64 lockend, struct extent_state **cached_state) { + /* + * For subpage case, if the range is not at page boundary, we could + * have pages at the leading/tailing part of the range. + * This could lead to dead loop since filemap_range_has_page() + * will always return true. + * So here we need to do extra page alignment for + * filemap_range_has_page(). + */ + u64 page_lockstart = round_up(lockstart, PAGE_SIZE); + u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1; while (1) { struct btrfs_ordered_extent *ordered; int ret; @@ -2506,7 +2516,7 @@ static int btrfs_punch_hole_lock_range(struct inode *inode, (ordered->file_offset + ordered->num_bytes <= lockstart || ordered->file_offset > lockend)) && !filemap_range_has_page(inode->i_mapping, - lockstart, lockend)) { + page_lockstart, page_lockend)) { if (ordered) btrfs_put_ordered_extent(ordered); break; From patchwork Fri May 21 06:40:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0305CC433B4 for ; Fri, 21 May 2021 06:43:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFB646135C for ; Fri, 21 May 2021 06:43:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233698AbhEUGoz (ORCPT ); Fri, 21 May 2021 02:44:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:59010 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233651AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579288; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UFDZa95E5N4vSGBOtLPmqCESeN/Vw++PjBgEYXF2Jms=; b=seP1kXam/MAR7KCna2IISsaNt775Dm1+c7Xv87y2qsAbrcF5SXeQvgkNQygw3j1HgKVxEZ sKeo0wJKg17yzMQeXp7Dl3nZZJ7oAAvZH+xpk6RQeJ99xSAGE6Qli/oXnG7iVWWgedFoM5 jSKljhLOLMcRqyZi5lDliBwJn/Ld9IA= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7D723AE4B for ; Fri, 21 May 2021 06:41:28 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 19/31] btrfs: don't clear page extent mapped if we're not invalidating the full page Date: Fri, 21 May 2021 14:40:38 +0800 Message-Id: <20210521064050.191164-20-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] With current btrfs subpage rw support, the following script can lead to fs hang: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt The fs will hang at btrfs_start_ordered_extent(). [CAUSE] In above test case, btrfs_invalidate() will be called with the following parameters: offset = 0 length = 53248 page dirty = 1 subpage dirty bitmap = 0x2000 Since @offset is 0, btrfs_invalidate() will try to invalidate the full page, and finally call clear_page_extent_mapped() which will detach btrfs subpage structure from the page. And since the page no longer has btrfs subpage structure, the subpage dirty bitmap will be cleared, preventing the dirty range from written back, thus no way to wake up the ordered extent. [FIX] Just follow other fses, only to invalidate the page if the range covers the full page. There are cases like truncate_setsize() which can call btrfs_invalidatepage() with offset == 0 and length != 0 for the last page of an inode. Although the old code will still try to invalidate the full page, we are still safe to just wait for ordered extent to finish. So it shouldn't cause extra problems. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 39049a614b7c..f82055ba09c8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8351,7 +8351,19 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, */ wait_on_page_writeback(page); - if (offset) { + /* + * For subpage case, we have call sites like + * btrfs_punch_hole_lock_range() which passes range not aligned to + * sectorsize. + * If the range doesn't cover the full page, we don't need to and + * shouldn't clear page extent mapped, as page->private can still + * record subpage dirty bits for other part of the range. + * + * For cases where can invalidate the full even the range doesn't + * cover the full page, like invalidating the last page, we're + * still safe to wait for ordered extent to finish. + */ + if (!(offset == 0 && length == PAGE_SIZE)) { btrfs_releasepage(page, GFP_NOFS); return; } From patchwork Fri May 21 06:40:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A686BC433B4 for ; Fri, 21 May 2021 06:43:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B304611AB for ; Fri, 21 May 2021 06:43:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233010AbhEUGow (ORCPT ); Fri, 21 May 2021 02:44:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:58778 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233482AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579290; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYwd8yKPZq1mQY1tZcf68ClNgr2CaN+mR1mny4d3Gzc=; b=uGGSkYiVP8bisceQWcbfWBwC4J5D1amcxG2Rvo+v20wA1sJGmEWCO6hlvuY8HFslMFuQSn 6QpliOJVqRIiwkCVSOczR9IB2/gc0qZoeNb40S6pmDIuor0ksOAvUgEIIQNIfWnmiu0GuB 4WpLwP5O0O6g72s2P4xNrECI4UCQZ0A= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 27A1FAE78 for ; Fri, 21 May 2021 06:41:30 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 20/31] btrfs: extract relocation page read and dirty part into its own function Date: Fri, 21 May 2021 14:40:39 +0800 Message-Id: <20210521064050.191164-21-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In function relocate_file_extent_cluster(), we have a big loop for marking all involved page delalloc. That part is long enough to be contained in one function, so this patch will move that code chunk into a new function, relocate_one_page(). This also provides enough space for later subpage work. Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 199 ++++++++++++++++++++---------------------- 1 file changed, 94 insertions(+), 105 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index b70be2ac2e9e..862fe5247c76 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2885,19 +2885,102 @@ noinline int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); -static int relocate_file_extent_cluster(struct inode *inode, - struct file_extent_cluster *cluster) +static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, + struct file_extent_cluster *cluster, + int *cluster_nr, unsigned long page_index) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + u64 offset = BTRFS_I(inode)->index_cnt; + const unsigned long last_index = (cluster->end - offset) >> PAGE_SHIFT; + gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); + struct page *page; u64 page_start; u64 page_end; + int ret; + + ASSERT(page_index <= last_index); + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), PAGE_SIZE); + if (ret) + return ret; + + page = find_lock_page(inode->i_mapping, page_index); + if (!page) { + page_cache_sync_readahead(inode->i_mapping, ra, NULL, + page_index, last_index + 1 - page_index); + page = find_or_create_page(inode->i_mapping, page_index, mask); + if (!page) { + ret = -ENOMEM; + goto release_delalloc; + } + } + ret = set_page_extent_mapped(page); + if (ret < 0) + goto release_page; + + if (PageReadahead(page)) + page_cache_async_readahead(inode->i_mapping, ra, NULL, page, + page_index, last_index + 1 - page_index); + + if (!PageUptodate(page)) { + btrfs_readpage(NULL, page); + lock_page(page); + if (!PageUptodate(page)) { + ret = -EIO; + goto release_page; + } + } + + page_start = page_offset(page); + page_end = page_start + PAGE_SIZE - 1; + + lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); + + if (*cluster_nr < cluster->nr && + page_start + offset == cluster->boundary[*cluster_nr]) { + set_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end, + EXTENT_BOUNDARY); + (*cluster_nr)++; + } + + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, + 0, NULL); + if (ret) { + clear_extent_bits(&BTRFS_I(inode)->io_tree, page_start, + page_end, EXTENT_LOCKED | EXTENT_BOUNDARY); + goto release_page; + + } + set_page_dirty(page); + + unlock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); + unlock_page(page); + put_page(page); + + btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); + balance_dirty_pages_ratelimited(inode->i_mapping); + btrfs_throttle(fs_info); + if (btrfs_should_cancel_balance(fs_info)) + ret = -ECANCELED; + return ret; + +release_page: + unlock_page(page); + put_page(page); +release_delalloc: + btrfs_delalloc_release_metadata(BTRFS_I(inode), PAGE_SIZE, true); + btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); + return ret; +} + +static int relocate_file_extent_cluster(struct inode *inode, + struct file_extent_cluster *cluster) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 offset = BTRFS_I(inode)->index_cnt; unsigned long index; unsigned long last_index; - struct page *page; struct file_ra_state *ra; - gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); - int nr = 0; + int cluster_nr = 0; int ret = 0; if (!cluster->nr) @@ -2918,109 +3001,15 @@ static int relocate_file_extent_cluster(struct inode *inode, if (ret) goto out; - index = (cluster->start - offset) >> PAGE_SHIFT; last_index = (cluster->end - offset) >> PAGE_SHIFT; - while (index <= last_index) { - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - PAGE_SIZE); - if (ret) - goto out; - - page = find_lock_page(inode->i_mapping, index); - if (!page) { - page_cache_sync_readahead(inode->i_mapping, - ra, NULL, index, - last_index + 1 - index); - page = find_or_create_page(inode->i_mapping, index, - mask); - if (!page) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - ret = -ENOMEM; - goto out; - } - } - ret = set_page_extent_mapped(page); - if (ret < 0) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - unlock_page(page); - put_page(page); - goto out; - } - - if (PageReadahead(page)) { - page_cache_async_readahead(inode->i_mapping, - ra, NULL, page, index, - last_index + 1 - index); - } - - if (!PageUptodate(page)) { - btrfs_readpage(NULL, page); - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - put_page(page); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - ret = -EIO; - goto out; - } - } - - page_start = page_offset(page); - page_end = page_start + PAGE_SIZE - 1; - - lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); - - if (nr < cluster->nr && - page_start + offset == cluster->boundary[nr]) { - set_extent_bits(&BTRFS_I(inode)->io_tree, - page_start, page_end, - EXTENT_BOUNDARY); - nr++; - } - - ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, - page_end, 0, NULL); - if (ret) { - unlock_page(page); - put_page(page); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - - clear_extent_bits(&BTRFS_I(inode)->io_tree, - page_start, page_end, - EXTENT_LOCKED | EXTENT_BOUNDARY); - goto out; - - } - set_page_dirty(page); - - unlock_extent(&BTRFS_I(inode)->io_tree, - page_start, page_end); - unlock_page(page); - put_page(page); - - index++; - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - balance_dirty_pages_ratelimited(inode->i_mapping); - btrfs_throttle(fs_info); - if (btrfs_should_cancel_balance(fs_info)) { - ret = -ECANCELED; - goto out; - } - } - WARN_ON(nr != cluster->nr); + for (index = (cluster->start - offset) >> PAGE_SHIFT; + index <= last_index && !ret; index++) + ret = relocate_one_page(inode, ra, cluster, &cluster_nr, + index); if (btrfs_is_zoned(fs_info) && !ret) ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); + if (ret == 0) + WARN_ON(cluster_nr != cluster->nr); out: kfree(ra); return ret; From patchwork Fri May 21 06:40:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BFF8C433ED for ; Fri, 21 May 2021 06:43:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EC3F611AB for ; Fri, 21 May 2021 06:43:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234292AbhEUGom (ORCPT ); Fri, 21 May 2021 02:44:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:58786 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233532AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579291; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zBaOJgPHO4sBkLqccgKDuFVp5DqG3NBbJGQS+eVyN0E=; b=rjMIUDcQmfRIEr+roJjWjkXVY2KD9vS1XRUxFqtoQY0kIglUsXZfbn6GkTGe3uhx4/JNHg p+Qzi8fw0uQEqA+HKYuzsHRPNv8JRBnemtwe5Qo8o9jbLLcgKbMd96TsbNHAjOpL1fUduw gYsi2k+cj/xAtqdOJXIbxCssACUUZ4I= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D2E7DADD7 for ; Fri, 21 May 2021 06:41:31 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 21/31] btrfs: make relocate_one_page() to handle subpage case Date: Fri, 21 May 2021 14:40:40 +0800 Message-Id: <20210521064050.191164-22-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage case, one page of data reloc inode can contain several file extents, like this: |<--- File extent A --->| FE B | FE C |<--- File extent D -->| |<--------- Page --------->| We can no longer use PAGE_SIZE directly for various operations. This patch will relocate_one_page() to handle subpage case by: - Iterating through all extents of a cluster when marking pages When marking pages dirty and delalloc, we need to check the cluster extent boundary. Now we introduce a loop to go extent by extent of a page, until we either finished the last extent, or reach the page end. By this, regular sectorsize == PAGE_SIZE can still work as usual, since we will do that loop only once. - Iteration start from max(page_start, extent_start) Since we can have the following case: | FE B | FE C |<--- File extent D -->| |<--------- Page --------->| Thus we can't always start from page_start, but do a max(page_start, extent_start) - Iteration end when the cluster is exhausted Similar to previous case, the last file extent can end before the page end: |<--- File extent A --->| FE B | FE C | |<--------- Page --------->| In this case, we need to manually exit the loop after we have finished the last extent of the cluster. - Reserve metadata space for each extent range Since now we can hit multiple ranges in one page, we should reserve metadata for each range, not simply PAGE_SIZE. Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 108 ++++++++++++++++++++++++++++++------------ 1 file changed, 79 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 862fe5247c76..cd50559c6d17 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -24,6 +24,7 @@ #include "block-group.h" #include "backref.h" #include "misc.h" +#include "subpage.h" /* * Relocation overview @@ -2885,6 +2886,17 @@ noinline int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); +static u64 get_cluster_boundary_end(struct file_extent_cluster *cluster, + int cluster_nr) +{ + /* Last extent, use cluster end directly */ + if (cluster_nr >= cluster->nr - 1) + return cluster->end; + + /* Use next boundary start*/ + return cluster->boundary[cluster_nr + 1] - 1; +} + static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, struct file_extent_cluster *cluster, int *cluster_nr, unsigned long page_index) @@ -2896,22 +2908,17 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, struct page *page; u64 page_start; u64 page_end; + u64 cur; int ret; ASSERT(page_index <= last_index); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), PAGE_SIZE); - if (ret) - return ret; - page = find_lock_page(inode->i_mapping, page_index); if (!page) { page_cache_sync_readahead(inode->i_mapping, ra, NULL, page_index, last_index + 1 - page_index); page = find_or_create_page(inode->i_mapping, page_index, mask); - if (!page) { - ret = -ENOMEM; - goto release_delalloc; - } + if (!page) + return -ENOMEM; } ret = set_page_extent_mapped(page); if (ret < 0) @@ -2933,30 +2940,76 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1; - lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); - - if (*cluster_nr < cluster->nr && - page_start + offset == cluster->boundary[*cluster_nr]) { - set_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end, - EXTENT_BOUNDARY); - (*cluster_nr)++; - } + /* + * Start from the cluster, as for subpage case, the cluster can start + * inside the page. + */ + cur = max(page_start, cluster->boundary[*cluster_nr] - offset); + while (cur <= page_end) { + u64 extent_start = cluster->boundary[*cluster_nr] - offset; + u64 extent_end = get_cluster_boundary_end(cluster, + *cluster_nr) - offset; + u64 clamped_start = max(page_start, extent_start); + u64 clamped_end = min(page_end, extent_end); + u32 clamped_len = clamped_end + 1 - clamped_start; + + /* Reserve metadata for this range */ + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), + clamped_len); + if (ret) + goto release_page; - ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, - 0, NULL); - if (ret) { - clear_extent_bits(&BTRFS_I(inode)->io_tree, page_start, - page_end, EXTENT_LOCKED | EXTENT_BOUNDARY); - goto release_page; + /* Mark the range delalloc and dirty for later writeback */ + lock_extent(&BTRFS_I(inode)->io_tree, clamped_start, + clamped_end); + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), clamped_start, + clamped_end, 0, NULL); + if (ret) { + clear_extent_bits(&BTRFS_I(inode)->io_tree, + clamped_start, clamped_end, + EXTENT_LOCKED | EXTENT_BOUNDARY); + btrfs_delalloc_release_metadata(BTRFS_I(inode), + clamped_len, true); + btrfs_delalloc_release_extents(BTRFS_I(inode), + clamped_len); + goto release_page; + } + btrfs_page_set_dirty(fs_info, page, clamped_start, clamped_len); + /* + * Set the boundary if it's inside the page. + * Data relocation requires the destination extents have the + * same size as the source. + * EXTENT_BOUNDARY bit prevent current extent from being merged + * with previous extent. + */ + if (in_range(cluster->boundary[*cluster_nr] - offset, + page_start, PAGE_SIZE)) { + u64 boundary_start = cluster->boundary[*cluster_nr] - + offset; + u64 boundary_end = boundary_start + + fs_info->sectorsize - 1; + + set_extent_bits(&BTRFS_I(inode)->io_tree, + boundary_start, boundary_end, + EXTENT_BOUNDARY); + } + unlock_extent(&BTRFS_I(inode)->io_tree, clamped_start, + clamped_end); + btrfs_delalloc_release_extents(BTRFS_I(inode), clamped_len); + cur += clamped_len; + + /* Crossed extent end, go to next extent */ + if (cur >= extent_end) { + (*cluster_nr)++; + /* Just finished the last extent of the cluster, exit. */ + if (*cluster_nr >= cluster->nr) + break; + } } - set_page_dirty(page); - - unlock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); unlock_page(page); put_page(page); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); balance_dirty_pages_ratelimited(inode->i_mapping); btrfs_throttle(fs_info); if (btrfs_should_cancel_balance(fs_info)) @@ -2966,9 +3019,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, release_page: unlock_page(page); put_page(page); -release_delalloc: - btrfs_delalloc_release_metadata(BTRFS_I(inode), PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); return ret; } From patchwork Fri May 21 06:40:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4D8BC433ED for ; Fri, 21 May 2021 06:43:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A7D14611AB for ; Fri, 21 May 2021 06:43:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233150AbhEUGox (ORCPT ); Fri, 21 May 2021 02:44:53 -0400 Received: from mx2.suse.de ([195.135.220.15]:58784 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233541AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hpdVwZXTFmjkdgZS1it/AgFByL/YQi57ngs1ps2Gp8E=; b=EQhS24LLd/b5kStQw18+2ZkCq4LGl1uLecpspSXOv7FVEa7td87usKX5ATYtamAtqh4gX8 Pas2NVUfwP5AxaJGbH99V9SlYhbzoxchLqMJd+dOt6qiB31X0P4aM31bNYImOGC1ty0syv Is0B/hWcUlJbkdM5fbGtGpal8l8efTw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6672CAE81 for ; Fri, 21 May 2021 06:41:33 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 22/31] btrfs: fix wild subpage writeback which does not have ordered extent. Date: Fri, 21 May 2021 14:40:41 +0800 Message-Id: <20210521064050.191164-23-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running fsstress with subpage RW support, there are random BUG_ON()s triggered with the following trace: kernel BUG at fs/btrfs/file-item.c:667! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3486 Comm: kworker/u13:2 Tainted: G WC O 5.11.0-rc4-custom+ #43 Hardware name: Radxa ROCK Pi 4B (DT) Workqueue: btrfs-worker-high btrfs_work_helper [btrfs] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : btrfs_csum_one_bio+0x420/0x4e0 [btrfs] lr : btrfs_csum_one_bio+0x400/0x4e0 [btrfs] Call trace: btrfs_csum_one_bio+0x420/0x4e0 [btrfs] btrfs_submit_bio_start+0x20/0x30 [btrfs] run_one_async_start+0x28/0x44 [btrfs] btrfs_work_helper+0x128/0x1b4 [btrfs] process_one_work+0x22c/0x430 worker_thread+0x70/0x3a0 kthread+0x13c/0x140 ret_from_fork+0x10/0x30 [CAUSE] Above BUG_ON() means there are some bio range which doesn't have ordered extent, which indeed is worthy a BUG_ON(). Unlike regular sectorsize == PAGE_SIZE case, in subpage we have extra subpage dirty bitmap to record which range is dirty and should be written back. This means, if we submit bio for a subpage range, we do not only need to clear page dirty, but also need to clear subpage dirty bits. In __extent_writepage_io(), we will call btrfs_page_clear_dirty() for any range we submit a bio. But there is loophole, if we hit a range which is beyond isize, we just call btrfs_writepage_endio_finish_ordered() to finish the ordered io, then break out, without clearing the subpage dirty. This means, if we hit above branch, the subpage dirty bits are still there, if other range of the page get dirtied and we need to writeback that page again, we will submit bio for the old range, leaving a wild bio range which doesn't have ordered extent. [FIX] Fix it by always calling btrfs_page_clear_dirty() in __extent_writepage_io(). Also to avoid such problem from happening again, add a new assert, btrfs_page_assert_not_dirty(), to make sure both page dirty and subpage dirty bits are cleared before exiting __extent_writepage_io(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 17 +++++++++++++++++ fs/btrfs/subpage.c | 16 ++++++++++++++++ fs/btrfs/subpage.h | 7 +++++++ 3 files changed, 40 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bb5e9f73ad2b..37661cfeeee2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3911,6 +3911,16 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, if (cur >= i_size) { btrfs_writepage_endio_finish_ordered(inode, page, cur, end, 1); + /* + * This range is beyond isize, thus we don't need to + * bother writing back. + * But we still need to clear the dirty subpage bit, or + * the next time the page get dirtied, we will try to + * writeback the sectors with subpage diryt bits, + * causing writeback without ordered extent. + */ + btrfs_page_clear_dirty(fs_info, page, cur, + end + 1 - cur); break; } @@ -3961,6 +3971,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, else btrfs_writepage_endio_finish_ordered(inode, page, cur, cur + iosize - 1, 1); + btrfs_page_clear_dirty(fs_info, page, cur, iosize); cur += iosize; continue; } @@ -3996,6 +4007,12 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, cur += iosize; nr++; } + /* + * If we finishes without problem, we should not only clear page dirty, + * but also emptied subpage dirty bits + */ + if (!ret) + btrfs_page_assert_not_dirty(fs_info, page); *nr_ret = nr; return ret; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 516e0b3f2ed9..696485ab68a2 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -548,3 +548,19 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback, PageWriteback); IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered, PageOrdered); + +void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct page *page) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + + if (!IS_ENABLED(CONFIG_BTRFS_ASSERT)) + return; + + ASSERT(!PageDirty(page)); + if (fs_info->sectorsize == PAGE_SIZE) + return; + + ASSERT(PagePrivate(page) && page->private); + ASSERT(subpage->dirty_bitmap == 0); +} diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 3419b152c00f..7188e9d2fbea 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -119,4 +119,11 @@ DECLARE_BTRFS_SUBPAGE_OPS(ordered); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); +/* + * Extra assert to make sure not only the page dirty bit is cleared, but also + * subpage dirty bit is cleared. + */ +void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct page *page); + #endif From patchwork Fri May 21 06:40:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 771B1C433B4 for ; Fri, 21 May 2021 06:43:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5D1FC613B6 for ; Fri, 21 May 2021 06:43:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234099AbhEUGpC (ORCPT ); Fri, 21 May 2021 02:45:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:59012 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233598AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579295; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=agmVZVuRjitiJ9WCPRyT+gVidq/i3wil1BmRpT4i7qs=; b=dos6466WEQU2uRRMW1BfVeaLRJWNvbnMOP0V5dU/xjVFV0KfNogwUpSrt122Ic5e5riPVx SWxn09HX4F5Xhnr+NDBHGiG61/fI/ZbqdzxO43RCx85BSKN43ZZ+KZ0Tei+76bc1URqYjK FT+AWXxyqB+nGPh2i8BpCesoVyRTM+4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EE728AE86 for ; Fri, 21 May 2021 06:41:34 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 23/31] btrfs: disable inline extent creation for subpage Date: Fri, 21 May 2021 14:40:42 +0800 Message-Id: <20210521064050.191164-24-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running the following fsx command (extracted from generic/127) on subpage btrfs, it can create inline extent with regular extents: fsx -q -l 262144 -o 65536 -S 191110531 -N 9057 -R -W $mnt/file > /tmp/fsx The offending extent would look like: item 9 key (257 INODE_REF 256) itemoff 15703 itemsize 14 index 2 namelen 4 name: file item 10 key (257 EXTENT_DATA 0) itemoff 14975 itemsize 728 generation 7 type 0 (inline) inline extent data size 707 ram_bytes 707 compression 0 (none) item 11 key (257 EXTENT_DATA 4096) itemoff 14922 itemsize 53 generation 7 type 2 (prealloc) prealloc data disk byte 102346752 nr 4096 prealloc data offset 0 nr 4096 [CAUSE] For subpage btrfs, the writeback is triggered in page unit, which means, even if we just want to writeback range [16K, 20K) for 64K page system, we will still try to writeback any dirty sector of range [0, 64K). This is never a problem if sectorsize == PAGE_SIZE, but for subpage, this can cause unexpected problems. For above test case, the last several operations from fsx are: 9055 trunc from 0x40000 to 0x2c3 9057 falloc from 0x164c to 0x19d2 (0x386 bytes) In operation 9055, we dirtied sector [0, 4096), then in falloc, we call btrfs_wait_ordered_range(inode, start=4096, len=4096), only expecting to writeback any dirty data in [4096, 8192), but nothing else. Unfortunately, in subpage case, above btrfs_wait_ordered_range() will trigger writeback of the range [0, 64K), which includes the data at [0, 4096). And since at the call site, we haven't yet increased i_size, which is still 707, this means cow_file_range() can insert an inline extent. Resulting above inline + regular extent. [WORKAROUND] I don't really have any good short-term solution yet, as this means all operations that would trigger writeback need to be reviewed for any isize change. So here I choose to disable inline extent creation for subpage case as a workaround. We have done tons of work just to avoid such extent, so I don't to create an exception just for subpage. This only affects inline extent creation, btrfs subpage support has no problem reading existing inline extents at all. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f82055ba09c8..f90cb8676e36 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -682,7 +682,11 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) } } cont: - if (start == 0) { + /* + * Check cow_file_range() for why we don't even try to create + * inline extent for subpage case. + */ + if (start == 0 && fs_info->sectorsize == PAGE_SIZE) { /* lets try to make an inline extent */ if (ret || total_in < actual_end) { /* we didn't compress the entire range, try @@ -1080,7 +1084,17 @@ static noinline int cow_file_range(struct btrfs_inode *inode, inode_should_defrag(inode, start, end, num_bytes, SZ_64K); - if (start == 0) { + /* + * Due to the page size limit, for subpage we can only trigger the + * writeback for the dirty sectors of page, that means data writeback + * is doing more writeback than what we want. + * + * This is especially unexpected for some call sites like fallocate, + * where we only increase isize after everything is done. + * This means we can trigger inline extent even we didn't want. + * So here we skip inline extent creation completely. + */ + if (start == 0 && fs_info->sectorsize == PAGE_SIZE) { /* lets try to make an inline extent */ ret = cow_file_range_inline(inode, start, end, 0, BTRFS_COMPRESS_NONE, NULL); From patchwork Fri May 21 06:40:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51C67C433ED for ; Fri, 21 May 2021 06:43:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 382DF611AB for ; Fri, 21 May 2021 06:43:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233673AbhEUGpD (ORCPT ); Fri, 21 May 2021 02:45:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:59014 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233669AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579296; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LTlSxZrTZK8mVPK/Jak4KXUG4D620sLKAhSJvyTr7rY=; b=VuY55lq0E8RHv8Q3jYuIEm+uoSR0El1RtJL4Tp/zYcO/s71WOvGqZvaS4jEALtQOLc3pHB UfAJLFkPMYpICn4jBLUbzpIaNM0kCAY2g0ohZSfnqUxvMiQQ6tOBj685fZ+8ePupzuTmYD QVB3YTXewytoFGbuOetq/riHPtEpn3E= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B63B4AE89 for ; Fri, 21 May 2021 06:41:36 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 24/31] btrfs: allow submit_extent_page() to do bio split for subpage Date: Fri, 21 May 2021 14:40:43 +0800 Message-Id: <20210521064050.191164-25-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Current submit_extent_page() just if the current page range can fit into the current bio, and if not, submit then re-add. But this behavior has a problem, it can't handle subpage cases. For subpage case, the problem is in the page size, 64K, which is also the same size as stripe size. This means, if we can't fit a full 64K into a bio, due to stripe limit, then it won't fit into next bio without crossing stripe either. The proper way to handle it is: - Check how many bytes we can put into current bio - Put as many bytes as possible into current bio first - Submit current bio - Create new bio - Add the remaining bytes into the new bio Refactor submit_extent_page() so that it does the above iteration. The main loop inside submit_extent_page() will look like this: cur = pg_offset; while (cur < pg_offset + size) { u32 offset = cur - pg_offset; int added; if (!bio_ctrl->bio) { /* Allocate new bio if needed */ } /* Add as many bytes into the bio */ if (added < size - offset) { /* The current bio is full, submit it */ } cur += added; } Also, since we're doing new bio allocation deep inside the main loop, extra that code into a new function, alloc_new_bio(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 182 ++++++++++++++++++++++++++++--------------- 1 file changed, 121 insertions(+), 61 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 37661cfeeee2..2c1849e7976d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -172,6 +172,7 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num, bio->bi_private = NULL; + ASSERT(bio->bi_iter.bi_size); if (is_data_inode(tree->private_data)) ret = btrfs_submit_data_bio(tree->private_data, bio, mirror_num, bio_flags); @@ -3200,13 +3201,13 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) * @size: portion of page that we want to write * @prev_bio_flags: flags of previous bio to see if we can merge the current one * @bio_flags: flags of the current bio to see if we can merge them - * @return: true if page was added, false otherwise * * Attempt to add a page to bio considering stripe alignment etc. * - * Return true if successfully page added. Otherwise, return false. + * Return >= 0 for the number of bytes added to the bio. + * Return <0 for error. */ -static bool btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, +static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, struct page *page, u64 disk_bytenr, unsigned int size, unsigned int pg_offset, @@ -3214,6 +3215,7 @@ static bool btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, { struct bio *bio = bio_ctrl->bio; u32 bio_size = bio->bi_iter.bi_size; + u32 real_size; const sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig; int ret; @@ -3222,25 +3224,31 @@ static bool btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, /* The limit should be calculated when bio_ctrl->bio is allocated */ ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary); if (bio_ctrl->bio_flags != bio_flags) - return false; + return 0; if (bio_ctrl->bio_flags & EXTENT_BIO_COMPRESSED) contig = bio->bi_iter.bi_sector == sector; else contig = bio_end_sector(bio) == sector; if (!contig) - return false; + return 0; - if (bio_size + size > bio_ctrl->len_to_oe_boundary || - bio_size + size > bio_ctrl->len_to_stripe_boundary) - return false; + real_size = min(bio_ctrl->len_to_oe_boundary, + bio_ctrl->len_to_stripe_boundary) - bio_size; + real_size = min(real_size, size); + /* + * If real_size is 0, never call bio_add_*_page(), as even size is 0, + * bio will still execute its endio function on the page! + */ + if (real_size == 0) + return 0; if (bio_op(bio) == REQ_OP_ZONE_APPEND) - ret = bio_add_zone_append_page(bio, page, size, pg_offset); + ret = bio_add_zone_append_page(bio, page, real_size, pg_offset); else - ret = bio_add_page(bio, page, size, pg_offset); + ret = bio_add_page(bio, page, real_size, pg_offset); - return ret == size; + return ret; } static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, @@ -3299,6 +3307,61 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, return 0; } +static int alloc_new_bio(struct btrfs_inode *inode, + struct btrfs_bio_ctrl *bio_ctrl, + unsigned int opf, + bio_end_io_t end_io_func, + u64 disk_bytenr, u32 offset, + unsigned long bio_flags) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct bio *bio; + int ret; + + /* + * For compressed page range, its disk_bytenr is always + * @disk_bytenr passed in, no matter if we have added + * any range into previous bio. + */ + if (bio_flags & EXTENT_BIO_COMPRESSED) + bio = btrfs_bio_alloc(disk_bytenr); + else + bio = btrfs_bio_alloc(disk_bytenr + offset); + bio_ctrl->bio = bio; + bio_ctrl->bio_flags = bio_flags; + ret = calc_bio_boundaries(bio_ctrl, inode); + if (ret < 0) { + bio_ctrl->bio = NULL; + bio->bi_status = errno_to_blk_status(ret); + bio_endio(bio); + return ret; + } + bio->bi_end_io = end_io_func; + bio->bi_private = &inode->io_tree; + bio->bi_write_hint = inode->vfs_inode.i_write_hint; + bio->bi_opf = opf; + if (btrfs_is_zoned(fs_info) && bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, disk_bytenr, + fs_info->sectorsize); + if (IS_ERR(em)) { + bio_ctrl->bio = NULL; + bio->bi_status = errno_to_blk_status(ret); + bio_endio(bio); + return ret; + } + + map = em->map_lookup; + /* We only support single profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } + return 0; +} /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3324,67 +3387,64 @@ static int submit_extent_page(unsigned int opf, bool force_bio_submit) { int ret = 0; - struct bio *bio; - size_t io_size = min_t(size_t, size, PAGE_SIZE); struct btrfs_inode *inode = BTRFS_I(page->mapping->host); - struct extent_io_tree *tree = &inode->io_tree; - struct btrfs_fs_info *fs_info = inode->root->fs_info; + unsigned int cur = pg_offset; ASSERT(bio_ctrl); ASSERT(pg_offset < PAGE_SIZE && size <= PAGE_SIZE && pg_offset + size <= PAGE_SIZE); - if (bio_ctrl->bio) { - bio = bio_ctrl->bio; - if (force_bio_submit || - !btrfs_bio_add_page(bio_ctrl, page, disk_bytenr, io_size, - pg_offset, bio_flags)) { - ret = submit_one_bio(bio, mirror_num, bio_ctrl->bio_flags); + if (force_bio_submit && bio_ctrl->bio) { + ret = submit_one_bio(bio_ctrl->bio, mirror_num, + bio_ctrl->bio_flags); + bio_ctrl->bio = NULL; + if (ret < 0) + return ret; + } + while (cur < pg_offset + size) { + u32 offset = cur - pg_offset; + int added; + /* Allocate new bio if needed */ + if (!bio_ctrl->bio) { + ret = alloc_new_bio(inode, bio_ctrl, opf, end_io_func, + disk_bytenr, offset, bio_flags); + if (ret < 0) + return ret; + } + /* + * We must go through btrfs_bio_add_page() to ensure each + * page range won't cross various boundaries. + */ + if (bio_flags & EXTENT_BIO_COMPRESSED) + added = btrfs_bio_add_page(bio_ctrl, page, disk_bytenr, + size - offset, pg_offset + offset, + bio_flags); + else + added = btrfs_bio_add_page(bio_ctrl, page, + disk_bytenr + offset, size - offset, + pg_offset + offset, bio_flags); + + /* Metadata page range should never be split */ + if (!is_data_inode(&inode->vfs_inode)) + ASSERT(added == 0 || added == size); + + /* At least we added some page, update the account */ + if (wbc && added) + wbc_account_cgroup_owner(wbc, page, added); + + /* We have reached boundary, submit right now */ + if (added < size - offset) { + /* The bio should contain some page(s) */ + ASSERT(bio_ctrl->bio->bi_iter.bi_size); + ret = submit_one_bio(bio_ctrl->bio, mirror_num, + bio_ctrl->bio_flags); bio_ctrl->bio = NULL; if (ret < 0) return ret; - } else { - if (wbc) - wbc_account_cgroup_owner(wbc, page, io_size); - return 0; } + cur += added; } - - bio = btrfs_bio_alloc(disk_bytenr); - bio_add_page(bio, page, io_size, pg_offset); - bio->bi_end_io = end_io_func; - bio->bi_private = tree; - bio->bi_write_hint = page->mapping->host->i_write_hint; - bio->bi_opf = opf; - if (wbc) { - struct block_device *bdev; - - bdev = fs_info->fs_devices->latest_bdev; - bio_set_dev(bio, bdev); - wbc_init_bio(wbc, bio); - wbc_account_cgroup_owner(wbc, page, io_size); - } - if (btrfs_is_zoned(fs_info) && bio_op(bio) == REQ_OP_ZONE_APPEND) { - struct extent_map *em; - struct map_lookup *map; - - em = btrfs_get_chunk_map(fs_info, disk_bytenr, io_size); - if (IS_ERR(em)) - return PTR_ERR(em); - - map = em->map_lookup; - /* We only support single profile for now */ - ASSERT(map->num_stripes == 1); - btrfs_io_bio(bio)->device = map->stripes[0].dev; - - free_extent_map(em); - } - - bio_ctrl->bio = bio; - bio_ctrl->bio_flags = bio_flags; - ret = calc_bio_boundaries(bio_ctrl, inode); - - return ret; + return 0; } static int attach_extent_buffer_page(struct extent_buffer *eb, From patchwork Fri May 21 06:40:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 304D2C43462 for ; Fri, 21 May 2021 06:43:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B9F3611AD for ; Fri, 21 May 2021 06:43:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233346AbhEUGot (ORCPT ); Fri, 21 May 2021 02:44:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:58780 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233613AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579298; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t7uiT8k8UkHpE0BxV8PFyBHvTkexqnU/zm9FYZH1XjE=; b=NFGxX0/v7gIYJeHXSQcGECwGZWVOmcEGMgWSAmikB5Ak7Pl4PxwGdTG/E9CwIFfdcH/TiS XKZt4Cki1jiycWtXoXa32ZjPRdrj0Z8LV/vzJjFdkfuLI6wL8s6Y3j2Uevfhi+m5PI2Q+r 3Jr7yk9qUCYPNolRNyp1X8PT/Hx9dr8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6C255AE8D for ; Fri, 21 May 2021 06:41:38 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 25/31] btrfs: make defrag to be semi subpage compatible Date: Fri, 21 May 2021 14:40:44 +0800 Message-Id: <20210521064050.191164-26-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs defrag is mostly just read the page contents and mark them dirty so later writeback can try to merge them into a larger extent. Currently only two parts are blocking subpage support in defrag: - The clear_page_dirty_for_io() and set_page_dirty() calls They don't updated the subpage dirty bit, will make subpage writeback to skip the page completely, causing ordered extent never to finish. - The check for whether a page is released Needs extra check on page Private flag. With above calls sites modified, now defrag is semi subpage compatible. The remaining problem is, we set the full page dirty, which means if we have a page which only has one sector on-disk, all the other 15 sectors are just hole, then after defrag the whole 64K page will be re-written as a new extent. This is just a place holder to make most btrfs defrag tests happy, a proper refactor will follow to make btrfs defrag to be completely subpage compatible later. Signed-off-by: Qu Wenruo --- fs/btrfs/ioctl.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a7739461533d..0cd6abc88c55 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -47,6 +47,7 @@ #include "space-info.h" #include "delalloc-space.h" #include "block-group.h" +#include "subpage.h" #ifdef CONFIG_64BIT /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI @@ -1160,6 +1161,7 @@ static int cluster_pages_for_defrag(struct inode *inode, unsigned long start_index, unsigned long num_pages) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); unsigned long file_end; u64 isize = i_size_read(inode); u64 page_start; @@ -1225,7 +1227,7 @@ static int cluster_pages_for_defrag(struct inode *inode, * we unlocked the page above, so we need check if * it was released or not. */ - if (page->mapping != inode->i_mapping) { + if (page->mapping != inode->i_mapping || !PagePrivate(page)) { unlock_page(page); put_page(page); goto again; @@ -1243,7 +1245,7 @@ static int cluster_pages_for_defrag(struct inode *inode, } } - if (page->mapping != inode->i_mapping) { + if (page->mapping != inode->i_mapping || !PagePrivate(page)) { unlock_page(page); put_page(page); goto again; @@ -1324,9 +1326,11 @@ static int cluster_pages_for_defrag(struct inode *inode, page_start, page_end - 1, &cached_state); for (i = 0; i < i_done; i++) { - clear_page_dirty_for_io(pages[i]); + btrfs_page_clear_dirty(fs_info, pages[i], page_offset(pages[i]), + PAGE_SIZE); ClearPageChecked(pages[i]); - set_page_dirty(pages[i]); + btrfs_page_set_dirty(fs_info, pages[i], page_offset(pages[i]), + PAGE_SIZE); unlock_page(pages[i]); put_page(pages[i]); } From patchwork Fri May 21 06:40:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61B82C43460 for ; Fri, 21 May 2021 06:43:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4398A613BB for ; Fri, 21 May 2021 06:43:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233952AbhEUGo4 (ORCPT ); Fri, 21 May 2021 02:44:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:59016 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233658AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579300; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WLHcUwwjfYxfLurPu42tf5So5XuGiiZl4dK1bK6l8dM=; b=d25hnYUPo6QwwywSbiw9406F9sVNin4jjqX0iINWZLb3cC25TN9Gwf13qfV+K9A9KL2NBZ YYs2yKcxhNRpa1qAXic6MRNeVdb8DK8IoB6UJ9eMKSuRHpSrZuR6Y7KX9xCXHzC5s39Xjx aaRPNURrzBG2heDTnN/Tmp92g6MejwQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 08260AE93 for ; Fri, 21 May 2021 06:41:40 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 26/31] btrfs: reject raid5/6 fs for subpage Date: Fri, 21 May 2021 14:40:45 +0800 Message-Id: <20210521064050.191164-27-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Raid5/6 is not only unsafe due to its write-hole problem, but also has tons of hardcoded PAGE_SIZE. So disable it for subpage support for now. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 10 ++++++++++ fs/btrfs/volumes.c | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8c3db9076988..2dd48f4bec8f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3406,6 +3406,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device goto fail_alloc; } } + if (sectorsize != PAGE_SIZE) { + if (btrfs_super_incompat_flags(fs_info->super_copy) & + BTRFS_FEATURE_INCOMPAT_RAID56) { + btrfs_err(fs_info, + "raid5/6 is not yet supported for sector size %u with page size %lu", + sectorsize, PAGE_SIZE); + err = -EINVAL; + goto fail_alloc; + } + } ret = btrfs_init_workqueues(fs_info, fs_devices); if (ret) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 80e962788396..5a0a0f23184e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3937,11 +3937,19 @@ static inline int validate_convert_profile(struct btrfs_fs_info *fs_info, if (!(bargs->flags & BTRFS_BALANCE_ARGS_CONVERT)) return true; + if (fs_info->sectorsize < PAGE_SIZE && + bargs->target & BTRFS_BLOCK_GROUP_RAID56_MASK) { + btrfs_err(fs_info, + "RAID5/6 is not supported yet for sectorsize %u with page size %lu", + fs_info->sectorsize, PAGE_SIZE); + goto invalid; + } /* Profile is valid and does not have bits outside of the allowed set */ if (alloc_profile_is_valid(bargs->target, 1) && (bargs->target & ~allowed) == 0) return true; +invalid: btrfs_err(fs_info, "balance: invalid convert %s profile %s", type, btrfs_bg_type_to_raid_name(bargs->target)); return false; From patchwork Fri May 21 06:40:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2C08C43470 for ; Fri, 21 May 2021 06:43:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B77D5611AD for ; Fri, 21 May 2021 06:43:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233443AbhEUGoz (ORCPT ); Fri, 21 May 2021 02:44:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:58782 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233646AbhEUGoT (ORCPT ); Fri, 21 May 2021 02:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579302; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31hiDlu2m0+PuxWAdKVpkI5nIDb/NFr953Ga8n0PrRg=; b=re3EmjYYHygKtP3pnMDoUrAp5VJKWt9YLRJyuRwiWsip2ZX0pnSRbJ+2yPbujyK/r57oTA zKaZHUXQZYmzNiUZuv6+3tHeAT9FFYwEFu1z33ETfiFWegoV3d7LM1fgGssroAuCHPVRIE ocPgMRh0mO9S35W1eEqHQz4TjRbofws= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E37EEAE92; Fri, 21 May 2021 06:41:41 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani Subject: [PATCH v3 27/31] btrfs: fix a crash caused by race between prepare_pages() and btrfs_releasepage() Date: Fri, 21 May 2021 14:40:46 +0800 Message-Id: <20210521064050.191164-28-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running generic/095, there is a high chance to crash with subpage data RW support: assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:171 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3403! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3567 Comm: fio Tainted: G C O 5.12.0-rc7-custom+ #17 Hardware name: Khadas VIM3 (DT) Call trace: assertfail.constprop.0+0x28/0x2c [btrfs] btrfs_subpage_assert+0x80/0xa0 [btrfs] btrfs_subpage_set_uptodate+0x34/0xec [btrfs] btrfs_page_clamp_set_uptodate+0x74/0xa4 [btrfs] btrfs_dirty_pages+0x160/0x270 [btrfs] btrfs_buffered_write+0x444/0x630 [btrfs] btrfs_direct_write+0x1cc/0x2d0 [btrfs] btrfs_file_write_iter+0xc0/0x160 [btrfs] new_sync_write+0xe8/0x180 vfs_write+0x1b4/0x210 ksys_pwrite64+0x7c/0xc0 __arm64_sys_pwrite64+0x24/0x30 el0_svc_common.constprop.0+0x70/0x140 do_el0_svc+0x28/0x90 el0_svc+0x2c/0x54 el0_sync_handler+0x1a8/0x1ac el0_sync+0x170/0x180 Code: f0000160 913be042 913c4000 955444bc (d4210000) ---[ end trace 3fdd39f4cccedd68 ]--- [CAUSE] Although prepare_pages() calls find_or_create_page(), which returns the page locked, but in later prepare_uptodate_page() calls, we may call btrfs_readpage() which unlocked the page. This leaves a window where btrfs_releasepage() can sneak in and release the page. This can be proven by the dying ftrace dump: fio-3567 : prepare_pages: r/i=5/257 page_offset=262144 private=1 after set extent map fio-3536 : __btrfs_releasepage.part.0: r/i=5/257 page_offset=262144 private=1 clear extent map fio-3567 : prepare_uptodate_page.part.0: r/i=5/257 page_offset=262144 private=0 after readpage fio-3567 : btrfs_dirty_pages: r/i=5/257 page_offset=262144 private=0 NOT PRIVATE [FIX] In prepare_uptodate_page(), we should not only check page->mapping, but also PagePrivate() to ensure we are still hold a correct page which has proper fs context setup. Reported-by: Ritesh Harjani Tested-by: Ritesh Harjani Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/file.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 6ef44afa939c..a4c092028bb6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1341,7 +1341,17 @@ static int prepare_uptodate_page(struct inode *inode, unlock_page(page); return -EIO; } - if (page->mapping != inode->i_mapping) { + + /* + * Since btrfs_readpage() will get the page unlocked, we have + * a window where fadvice() can try to release the page. + * Here we check both inode mapping and PagePrivate() to + * make sure the page is not released. + * + * The priavte flag check is essential for subpage as we need + * to store extra bitmap using page->private. + */ + if (page->mapping != inode->i_mapping || !PagePrivate(page)) { unlock_page(page); return -EAGAIN; } From patchwork Fri May 21 06:40:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EEFEC433ED for ; Fri, 21 May 2021 06:43:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16B87611AD for ; Fri, 21 May 2021 06:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234181AbhEUGpI (ORCPT ); Fri, 21 May 2021 02:45:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:59018 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233199AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579303; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xek06VCcXJeGA0FPJENhSogIfByHmK7Igl9C5gGJW2M=; b=ipe8lxW2JsjJQGSSi4AC5e82uRG0Msivf5fnYLrYLYGjhxPdccBCLpduj/Yfiky9j94QZr 6Jv8khf1uyarPiQn59yCQUQEJCOQXdyAXuWtWTYjUYIUfwDoukjiGJP9yp5ZfPW+ea4iAr M2L/iqJKz+/q25WyYFNx5X57SpyFIQ8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D1E8AAE95; Fri, 21 May 2021 06:41:43 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani Subject: [PATCH v3 28/31] btrfs: fix a use-after-free bug in writeback subpage helper Date: Fri, 21 May 2021 14:40:47 +0800 Message-Id: <20210521064050.191164-29-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] There is a possible use-after-free bug when running generic/095. BUG: Unable to handle kernel data access on write at 0x6b6b6b6b6b6b725b Faulting instruction address: 0xc000000000283654 c000000000283078 do_raw_spin_unlock+0x88/0x230 c0000000012b1e14 _raw_spin_unlock_irqrestore+0x44/0x90 c000000000a918dc btrfs_subpage_clear_writeback+0xac/0xe0 c0000000009e0458 end_bio_extent_writepage+0x158/0x270 c000000000b6fd14 bio_endio+0x254/0x270 c0000000009fc0f0 btrfs_end_bio+0x1a0/0x200 c000000000b6fd14 bio_endio+0x254/0x270 c000000000b781fc blk_update_request+0x46c/0x670 c000000000b8b394 blk_mq_end_request+0x34/0x1d0 c000000000d82d1c lo_complete_rq+0x11c/0x140 c000000000b880a4 blk_complete_reqs+0x84/0xb0 c0000000012b2ca4 __do_softirq+0x334/0x680 c0000000001dd878 irq_exit+0x148/0x1d0 c000000000016f4c do_IRQ+0x20c/0x240 c000000000009240 hardware_interrupt_common_virt+0x1b0/0x1c0 [CAUSE] There is very small race window like the following in generic/095. Thread 1 | Thread 2 --------------------------------+------------------------------------ end_bio_extent_writepage() | btrfs_releasepage() |- spin_lock_irqsave() | | |- end_page_writeback() | | | | |- if (PageWriteback() ||...) | | |- clear_page_extent_mapped() | | |- kfree(subpage); |- spin_unlock_irqrestore(). The race can also happen between writeback and btrfs_invalidatepage(), although that would be much harder as btrfs_invalidatepage() has much more work to do before the clear_page_extent_mapped() call. [FIX] Here we "wait" for the subapge spinlock to be released before we detach subpage structure. So this patch will introduce a new function, wait_subpage_spinlock(), to do the "wait" by acquiring the spinlock and release it. Since the caller has ensured the page is not dirty nor writeback, and page is already locked, the only way to hold the subpage spinlock is from endio function. Thus we only need to acquire the spinlock to wait for any existing holder. Reported-by: Ritesh Harjani Tested-by: Ritesh Harjani Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++- fs/btrfs/subpage.c | 4 +++- 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f90cb8676e36..53b21a8228c7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8296,11 +8296,48 @@ static void btrfs_readahead(struct readahead_control *rac) extent_readahead(rac); } +/* + * For releasepage() and invalidatepage() we have a race window where + * end_page_writeback() is called but the subpage spinlock is not yet + * released. + * If we continue to release/invalidate the page, we could cause + * use-after-free for subpage spinlock. + * So this function is to spin wait for subpage spinlock. + */ +static void wait_subpage_spinlock(struct page *page) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb); + struct btrfs_subpage *subpage; + + if (fs_info->sectorsize == PAGE_SIZE) + return; + + ASSERT(PagePrivate(page) && page->private); + subpage = (struct btrfs_subpage *)page->private; + + /* + * This may look insane as we just acquire the spinlock and release it, + * without doing anything. + * But we just want to make sure no one is still holding the subpage + * spinlock. + * And since the page is not dirty nor writeback, and we have page + * locked, the only possible way to hold a spinlock is from the endio + * function to clear page writeback. + * + * Here we just acquire the spinlock so that all existing callers + * should exit and we're safe to release/invalidate the page. + */ + spin_lock_irq(&subpage->lock); + spin_unlock_irq(&subpage->lock); +} + static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags) { int ret = try_release_extent_mapping(page, gfp_flags); - if (ret == 1) + if (ret == 1) { + wait_subpage_spinlock(page); clear_page_extent_mapped(page); + } return ret; } @@ -8364,6 +8401,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, * do double ordered extent accounting on the same page. */ wait_on_page_writeback(page); + wait_subpage_spinlock(page); /* * For subpage case, we have call sites like diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 696485ab68a2..552410fba0bd 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -424,8 +424,10 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_lock_irqsave(&subpage->lock, flags); subpage->writeback_bitmap &= ~tmp; - if (subpage->writeback_bitmap == 0) + if (subpage->writeback_bitmap == 0) { + ASSERT(PageWriteback(page)); end_page_writeback(page); + } spin_unlock_irqrestore(&subpage->lock, flags); } From patchwork Fri May 21 06:40:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B77A9C433ED for ; Fri, 21 May 2021 06:43:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C616611AB for ; Fri, 21 May 2021 06:43:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233747AbhEUGpA (ORCPT ); Fri, 21 May 2021 02:45:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:58788 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233652AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579305; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=niiS2122snVOgNpQjHKCdlHtK2vHKjkwYFUob0o3Zgo=; b=CfNkA+VPjhUgpO7Pcxr3h/GiP/pMlnh+eBUtZH6eqydDZcqYRL/rPD4Lg046F5H2ILIYY2 B2RDnJen6o2njf4Gw3VhqqNMeXLqeCWM7UBW0ZPkRhu0l6AFCwEgh1Xr/ADElO++83pPX8 cMrtUd+N4wi0Anf+htPBvfmuAi/tGbw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 91582AE99 for ; Fri, 21 May 2021 06:41:45 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 29/31] btrfs: fix a subpage false alert for relocating partial preallocated data extents Date: Fri, 21 May 2021 14:40:48 +0800 Message-Id: <20210521064050.191164-30-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When relocating partial preallocated data extents (part of the preallocated extent is written) for subpage, it can cause the following false alert and make the relocation to fail: BTRFS info (device dm-3): balance: start -d BTRFS info (device dm-3): relocating block group 13631488 flags data BTRFS warning (device dm-3): csum failed root -9 ino 257 off 4096 csum 0x98757625 expected csum 0x00000000 mirror 1 BTRFS error (device dm-3): bdev /dev/mapper/arm_nvme-test errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 BTRFS warning (device dm-3): csum failed root -9 ino 257 off 4096 csum 0x98757625 expected csum 0x00000000 mirror 1 BTRFS error (device dm-3): bdev /dev/mapper/arm_nvme-test errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 BTRFS info (device dm-3): balance: ended with status: -5 The minimal script to reproduce looks like this: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt xfs_io -f -c "falloc 0 8k" $mnt/file xfs_io -f -c "pwrite 0 4k" $mnt/file btrfs balance start -d $mnt [CAUSE] Function btrfs_verify_data_csum() checks if the full range has EXTENT_NODATASUM bit for data reloc inode, if *all* bytes of the range has EXTENT_NODATASUM bit, then it skip the range. This works pretty well for regular sectorsize, as in that case btrfs_verify_data_csum() is called for each sector, thus no problem at all. But for subpage case, btrfs_verify_data_csum() is called on each bvec, which can contain several sectors, and since it checks *all* bytes for EXTENT_NODATASUM bit, if we have some range with csum, then we will continue checking all the sectors. For the preallocated sectors, it doesn't have any csum, thus obviously the csum won't match and cause the false alert. [FIX] Move the EXTENT_NODATASUM check into the main loop, so that we can check each sector for EXTENT_NODATASUM bit for subpage case. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 53b21a8228c7..45baae1ad4db 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3176,19 +3176,24 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, if (!root->fs_info->csum_root) return 0; - if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID && - test_range_bit(io_tree, start, end, EXTENT_NODATASUM, 1, NULL)) { - clear_extent_bits(io_tree, start, end, EXTENT_NODATASUM); - return 0; - } - ASSERT(page_offset(page) <= start && end <= page_offset(page) + PAGE_SIZE - 1); for (pg_off = offset_in_page(start); pg_off < offset_in_page(end); pg_off += sectorsize, bio_offset += sectorsize) { + u64 file_offset = pg_off + page_offset(page); int ret; + if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID && + test_range_bit(io_tree, file_offset, + file_offset + sectorsize - 1, + EXTENT_NODATASUM, 1, NULL)) { + /* Skip the range without csum for data reloc inode */ + clear_extent_bits(io_tree, file_offset, + file_offset + sectorsize - 1, + EXTENT_NODATASUM); + continue; + } ret = check_data_csum(inode, io_bio, bio_offset, page, pg_off, page_offset(page) + pg_off); if (ret < 0) { From patchwork Fri May 21 06:40:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 172CBC433B4 for ; Fri, 21 May 2021 06:43:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F073F611AB for ; Fri, 21 May 2021 06:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233868AbhEUGpF (ORCPT ); Fri, 21 May 2021 02:45:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:59020 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233671AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579307; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dWgfRwbtwCkoqW9q0cP9ufVBzZzsz90Zwg/X0XQ+wbE=; b=VMFdO/G0eXD42xMxIU3fS2z08xyyg2g1Fvwl6mh4GkVt2MeVmSaYIbi/BgjULI3ePm5Xup BonlA810iXMMG4PMdckLHggwv3rAX4ApupufHTaW0/5/SSEXlPWtRzfl1FyLfUgnvExEwW 1M2S5n4kiM6tePdeDn55sseHgx6I5Ag= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6CF27AEA3; Fri, 21 May 2021 06:41:47 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani Subject: [PATCH v3 30/31] btrfs: fix a subpage relocation data corruption Date: Fri, 21 May 2021 14:40:49 +0800 Message-Id: <20210521064050.191164-31-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When using the following script, btrfs will report data corruption after one data balance with subpage support: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt $fsstress -w -n 8 -s 1620948986 -d $mnt/ -v > /tmp/fsstress sync btrfs balance start -d $mnt btrfs scrub start -B $mnt Similar problem can be easily observed in btrfs/028 test case, there will be tons of balance failure with -EIO. [CAUSE] Above fsstress will result the following data extents layout in extent tree: item 10 key (13631488 EXTENT_ITEM 98304) itemoff 15889 itemsize 82 refs 2 gen 7 flags DATA extent data backref root FS_TREE objectid 259 offset 1339392 count 1 extent data backref root FS_TREE objectid 259 offset 647168 count 1 item 11 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 15865 itemsize 24 block group used 102400 chunk_objectid 256 flags DATA item 12 key (13733888 EXTENT_ITEM 4096) itemoff 15812 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 259 offset 729088 count 1 Then when creating the data reloc inode, the data reloc inode will look like this: 0 32K 64K 96K 100K 104K |<------ Extent A ----->| |<- Ext B ->| Then when we first try to relocate extent A, we setup the data reloc inode with iszie 96K, then read both page [0, 64K) and page [64K, 128K). For page 64K, since the isize is just 96K, we fill range [96K, 128K) with 0 and set it uptodate. Then when we come to extent B, we update isize to 104K, then try to read page [64K, 128K). Then we find the page is already uptodate, so we skip the read. But range [96K, 128K) is filled with 0, not the real data. Then we writeback the data reloc inode to disk, with 0 filling range [96K, 128K), corrupting the content of extent B. The behavior is caused by the fact that we still do full page read for subpage case. The bug won't really happen for regular sectorsize, as one page only contains one sector. [FIX] This patch will fix the problem by invalidating range [isize, PAGE_END] in prealloc_file_extent_cluster(). So that if above example happens, when we preallocate the file extent for extent B, we will clear the uptodate bits for range [96K, 128K), allowing later relocate_one_page() to re-read the needed range. Reported-by: Ritesh Harjani Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index cd50559c6d17..b50ee800993d 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2782,10 +2782,48 @@ static noinline_for_stack int prealloc_file_extent_cluster( u64 num_bytes; int nr; int ret = 0; + u64 isize = i_size_read(&inode->vfs_inode); u64 prealloc_start = cluster->start - offset; u64 prealloc_end = cluster->end - offset; u64 cur_offset = prealloc_start; + /* + * For subpage case, previous isize may not be aligned to PAGE_SIZE. + * This means the range [isize, PAGE_END + 1) is filled with 0 by + * btrfs_do_readpage() call of previously relocated file cluster. + * + * If the current cluster starts in above range, btrfs_do_readpage() + * will skip the read, and relocate_one_page() will later writeback + * the padding 0 as new data, causing data corruption. + * + * Here we have to manually invalidate the range (isize, PAGE_END + 1). + */ + if (!IS_ALIGNED(isize, PAGE_SIZE)) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; + const u32 sectorsize = fs_info->sectorsize; + struct page *page; + + ASSERT(sectorsize < PAGE_SIZE); + ASSERT(IS_ALIGNED(isize, sectorsize)); + + page = find_lock_page(inode->vfs_inode.i_mapping, + isize >> PAGE_SHIFT); + /* + * If page is freed we don't need to do anything then, as + * we will re-read the whole page anyway. + */ + if (page) { + u64 page_end = page_offset(page) + PAGE_SIZE - 1; + + clear_extent_bits(&inode->io_tree, isize, page_end, + EXTENT_UPTODATE); + btrfs_subpage_clear_uptodate(fs_info, page, isize, + page_end + 1 - isize); + unlock_page(page); + put_page(page); + } + } + BUG_ON(cluster->start != cluster->boundary[0]); ret = btrfs_alloc_data_chunk_ondemand(inode, prealloc_end + 1 - prealloc_start); From patchwork Fri May 21 06:40:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12272289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 408C1C433B4 for ; Fri, 21 May 2021 06:43:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2708A613B6 for ; Fri, 21 May 2021 06:43:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232861AbhEUGpJ (ORCPT ); Fri, 21 May 2021 02:45:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:58802 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233688AbhEUGoU (ORCPT ); Fri, 21 May 2021 02:44:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621579309; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q7F97P4AFH+GI3DD0plLLQzQQ1+RPtCjOjgsprU0GSs=; b=hQHHMZpSDzhDEzacoZNXqN1sWNLX8kTT32oSYMrN/8ajG8WC/y3eL2C3Y0wpiamw2wan6I nk5nTuURliEtYCdalBDjv4b4fNa9TKnkYzGh33i528Gaat5zw9vDaPzE2vAnVmgQ9DtXx2 99CgLaOGhImC1/h3Nz/5kaseQIAhnaM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1C548AEA6 for ; Fri, 21 May 2021 06:41:49 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 31/31] btrfs: allow read-write for 4K sectorsize on 64K page size systems Date: Fri, 21 May 2021 14:40:50 +0800 Message-Id: <20210521064050.191164-32-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210521064050.191164-1-wqu@suse.com> References: <20210521064050.191164-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since now we support data and metadata read-write for subpage, remove the RO requirement for subpage mount. There are some extra limits though: - For now, subpage RW mount is still considered experimental Thus that mount warning will still be there. - No compression support There are still quite some PAGE_SIZE hard coded and quite some call sites use extent_clear_unlock_delalloc() to unlock locked_page. This will screw up subpage helpers Now for subpage RW mount, no matter whatever mount option or inode attr is set, all write will not be compressed. Although reading compressed data has no problem. - No sectorsize defrag The problem here is, defrag is still done in full page size (64K). This means, if a page only has 4K data while the remaining 60K is all hole, after defrag it will be full 64K. This should not cause any kernel warning/hang nor data corruption, but it's still a behavior difference. - No inline extent will be created This is mostly due to the fact that filemap_fdatawrite_range() will trigger more write than the range specified. In fallocate calls, this behavior can make us to writeback which can be inlined, before we enlarge the isize. This is a very special corner case, and even current btrfs check won't report error on such inline extent + regular extent. But considering how much effort has been put to prevent such inline + regular, I'd prefer to cut off inline extent completely until we have a good solution. - Read-time data repair is in bvec size This is different from original sector size repair. Bvec size is a floating number between 4K to 64K (page size). If the extent is only 4K sized then we can do the repair in 4K size. But if the extent is larger, our repair unit grows follows the extent size, until it reaches PAGE_SIZE. This is mostly due to the design of the repair code, it can be enhanced later. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 13 ++++--------- fs/btrfs/inode.c | 3 +++ fs/btrfs/super.c | 7 ------- fs/btrfs/sysfs.c | 5 +++++ 4 files changed, 12 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2dd48f4bec8f..7c17cb7cf4fe 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3396,15 +3396,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device goto fail_alloc; } - /* For 4K sector size support, it's only read-only */ - if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) { - if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) { - btrfs_err(fs_info, - "subpage sectorsize %u only supported read-only for page size %lu", - sectorsize, PAGE_SIZE); - err = -EINVAL; - goto fail_alloc; - } + if (sectorsize != PAGE_SIZE) { + btrfs_warn(fs_info, + "read-write for sector size %u with page size %lu is experimental", + sectorsize, PAGE_SIZE); } if (sectorsize != PAGE_SIZE) { if (btrfs_super_incompat_flags(fs_info->super_copy) & diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 45baae1ad4db..c6a5db2ebe16 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -490,6 +490,9 @@ static noinline int add_async_extent(struct async_chunk *cow, */ static inline bool inode_can_compress(struct btrfs_inode *inode) { + /* Subpage doesn't support compress yet */ + if (inode->root->fs_info->sectorsize < PAGE_SIZE) + return false; if (inode->flags & BTRFS_INODE_NODATACOW || inode->flags & BTRFS_INODE_NODATASUM) return false; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4a396c1147f1..b18d268abfbb 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2053,13 +2053,6 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) ret = -EINVAL; goto restore; } - if (fs_info->sectorsize < PAGE_SIZE) { - btrfs_warn(fs_info, - "read-write mount is not yet allowed for sectorsize %u page size %lu", - fs_info->sectorsize, PAGE_SIZE); - ret = -EINVAL; - goto restore; - } /* * NOTE: when remounting with a change that does writes, don't diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 436ac7b4b334..752461a79364 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -366,6 +366,11 @@ static ssize_t supported_sectorsizes_show(struct kobject *kobj, { ssize_t ret = 0; + /* 4K sector size is also support with 64K page size */ + if (PAGE_SIZE == SZ_64K) + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u ", + SZ_4K); + /* Only sectorsize == PAGE_SIZE is now supported */ ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);