From patchwork Mon May 31 08:50:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 448E6C47082 for ; Mon, 31 May 2021 08:51:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 23EA261186 for ; Mon, 31 May 2021 08:51:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230475AbhEaIwy (ORCPT ); Mon, 31 May 2021 04:52:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:40634 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230399AbhEaIww (ORCPT ); Mon, 31 May 2021 04:52:52 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451072; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=teKEE91dAPdJgQQQK5G7UzIq3aJvkKc+dD0j/wCML4Q=; b=NNw72mXNZOt3rrboPI0pa23aYO2ns7zGdGdqcbwgOjnZ487GaKqhNaXnxN5TOMwJ1kSZ5W Gi//F8SWxH2oH14fiO0+85mb30vLoSVWn641JTgJOdLcZz1nXWtoGoXTYBUb4fFaXY947V r2iFizgYaCM/03wXpWxi8jgwxc/zLQ0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 682CBB31C for ; Mon, 31 May 2021 08:51:12 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 01/30] btrfs: pass bytenr directly to __process_pages_contig() Date: Mon, 31 May 2021 16:50:37 +0800 Message-Id: <20210531085106.259490-2-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As a preparation for incoming subpage support, we need bytenr passed to __process_pages_contig() directly, not the current page index. So change the parameter and all callers to pass bytenr in. With the modification, here we need to replace the old @index_ret with @processed_end for __process_pages_contig(), but this brings a small problem. Normally we follow the inclusive return value, meaning @processed_end should be the last byte we processed. If parameter @start is 0, and we failed to lock any page, then we would return @processed_end as -1, causing more problems for __unlock_for_delalloc(). So here for @processed_end, we use two different return value patterns. If we have locked any page, @processed_end will be the last byte of locked page. Or it will be @start otherwise. This change will impact lock_delalloc_pages(), so it needs to check @processed_end to only unlock the range if we have locked any. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 57 ++++++++++++++++++++++++++++---------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6f023f800bd6..95d61e03f33f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1810,8 +1810,8 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, - pgoff_t start_index, pgoff_t end_index, - unsigned long page_ops, pgoff_t *index_ret); + u64 start, u64 end, unsigned long page_ops, + u64 *processed_end); static noinline void __unlock_for_delalloc(struct inode *inode, struct page *locked_page, @@ -1824,7 +1824,7 @@ static noinline void __unlock_for_delalloc(struct inode *inode, if (index == locked_page->index && end_index == index) return; - __process_pages_contig(inode->i_mapping, locked_page, index, end_index, + __process_pages_contig(inode->i_mapping, locked_page, start, end, PAGE_UNLOCK, NULL); } @@ -1834,19 +1834,19 @@ static noinline int lock_delalloc_pages(struct inode *inode, u64 delalloc_end) { unsigned long index = delalloc_start >> PAGE_SHIFT; - unsigned long index_ret = index; unsigned long end_index = delalloc_end >> PAGE_SHIFT; + u64 processed_end = delalloc_start; int ret; ASSERT(locked_page); if (index == locked_page->index && index == end_index) return 0; - ret = __process_pages_contig(inode->i_mapping, locked_page, index, - end_index, PAGE_LOCK, &index_ret); - if (ret == -EAGAIN) + ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start, + delalloc_end, PAGE_LOCK, &processed_end); + if (ret == -EAGAIN && processed_end > delalloc_start) __unlock_for_delalloc(inode, locked_page, delalloc_start, - (u64)index_ret << PAGE_SHIFT); + processed_end); return ret; } @@ -1941,12 +1941,14 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, - pgoff_t start_index, pgoff_t end_index, - unsigned long page_ops, pgoff_t *index_ret) + u64 start, u64 end, unsigned long page_ops, + u64 *processed_end) { + pgoff_t start_index = start >> PAGE_SHIFT; + pgoff_t end_index = end >> PAGE_SHIFT; + pgoff_t index = start_index; unsigned long nr_pages = end_index - start_index + 1; unsigned long pages_processed = 0; - pgoff_t index = start_index; struct page *pages[16]; unsigned ret; int err = 0; @@ -1954,17 +1956,19 @@ static int __process_pages_contig(struct address_space *mapping, if (page_ops & PAGE_LOCK) { ASSERT(page_ops == PAGE_LOCK); - ASSERT(index_ret && *index_ret == start_index); + ASSERT(processed_end && *processed_end == start); } if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) mapping_set_error(mapping, -EIO); while (nr_pages > 0) { - ret = find_get_pages_contig(mapping, index, + int found_pages; + + found_pages = find_get_pages_contig(mapping, index, min_t(unsigned long, nr_pages, ARRAY_SIZE(pages)), pages); - if (ret == 0) { + if (found_pages == 0) { /* * Only if we're going to lock these pages, * can we find nothing at @index. @@ -2007,13 +2011,27 @@ static int __process_pages_contig(struct address_space *mapping, put_page(pages[i]); pages_processed++; } - nr_pages -= ret; - index += ret; + nr_pages -= found_pages; + index += found_pages; cond_resched(); } out: - if (err && index_ret) - *index_ret = start_index + pages_processed - 1; + if (err && processed_end) { + /* + * Update @processed_end. I know this is awful since it has + * two different return value patterns (inclusive vs exclusive). + * + * But the exclusive pattern is necessary if @start is 0, or we + * underflow and check against processed_end won't work as + * expected. + */ + if (pages_processed) + *processed_end = min(end, + ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); + else + *processed_end = start; + + } return err; } @@ -2024,8 +2042,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL); __process_pages_contig(inode->vfs_inode.i_mapping, locked_page, - start >> PAGE_SHIFT, end >> PAGE_SHIFT, - page_ops, NULL); + start, end, page_ops, NULL); } /* From patchwork Mon May 31 08:50:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D2C6C47082 for ; Mon, 31 May 2021 08:51:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C0F8611AC for ; Mon, 31 May 2021 08:51:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230441AbhEaIw5 (ORCPT ); Mon, 31 May 2021 04:52:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:40644 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230426AbhEaIwy (ORCPT ); Mon, 31 May 2021 04:52:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451074; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0WpGs8/k18dY/oPkynQ6M6Zu31PgHV9SdHa/awzAT8=; b=CEm8W/Oj03KWlTudGl9GfqgMVNZ9oFiac0fcpDqd61bRngitd7zfNaTVJiYky3rkKffzbF YmAcILFDOLG76VG3w1lONNDI38ZkLXF883XK09+Td72FM2lVPeYT9jbrzB2oaeMNxayWqI tg1kp9PkyaVGCYLg8n8qBFsB5SyhQTY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5E662B2E9; Mon, 31 May 2021 08:51:14 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 02/30] btrfs: refactor the page status update into process_one_page() Date: Mon, 31 May 2021 16:50:38 +0800 Message-Id: <20210531085106.259490-3-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In __process_pages_contig() we update page status according to page_ops. That update process is a bunch of if () {} branches, which lies inside two loops, this makes it pretty hard to expand for later subpage operations. So this patch will extract this operations into its own function, process_one_pages(). Also since we're refactoring __process_pages_contig(), also move the new helper and __process_pages_contig() before the first caller of them, to remove the forward declaration. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 206 +++++++++++++++++++++++-------------------- 1 file changed, 109 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 95d61e03f33f..201e68d858ea 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1808,10 +1808,118 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, return found; } +/* + * Process one page for __process_pages_contig(). + * + * Return >0 if we hit @page == @locked_page. + * Return 0 if we updated the page status. + * Return -EGAIN if the we need to try again. + * (For PAGE_LOCK case but got dirty page or page not belong to mapping) + */ +static int process_one_page(struct address_space *mapping, + struct page *page, struct page *locked_page, + unsigned long page_ops) +{ + if (page_ops & PAGE_SET_ORDERED) + SetPageOrdered(page); + + if (page == locked_page) + return 1; + + if (page_ops & PAGE_SET_ERROR) + SetPageError(page); + if (page_ops & PAGE_START_WRITEBACK) { + clear_page_dirty_for_io(page); + set_page_writeback(page); + } + if (page_ops & PAGE_END_WRITEBACK) + end_page_writeback(page); + if (page_ops & PAGE_LOCK) { + lock_page(page); + if (!PageDirty(page) || page->mapping != mapping) { + unlock_page(page); + return -EAGAIN; + } + } + if (page_ops & PAGE_UNLOCK) + unlock_page(page); + return 0; +} + static int __process_pages_contig(struct address_space *mapping, struct page *locked_page, u64 start, u64 end, unsigned long page_ops, - u64 *processed_end); + u64 *processed_end) +{ + pgoff_t start_index = start >> PAGE_SHIFT; + pgoff_t end_index = end >> PAGE_SHIFT; + pgoff_t index = start_index; + unsigned long nr_pages = end_index - start_index + 1; + unsigned long pages_processed = 0; + struct page *pages[16]; + int err = 0; + int i; + + if (page_ops & PAGE_LOCK) { + ASSERT(page_ops == PAGE_LOCK); + ASSERT(processed_end && *processed_end == start); + } + + if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) + mapping_set_error(mapping, -EIO); + + while (nr_pages > 0) { + int found_pages; + + found_pages = find_get_pages_contig(mapping, index, + min_t(unsigned long, + nr_pages, ARRAY_SIZE(pages)), pages); + if (found_pages == 0) { + /* + * Only if we're going to lock these pages, + * can we find nothing at @index. + */ + ASSERT(page_ops & PAGE_LOCK); + err = -EAGAIN; + goto out; + } + + for (i = 0; i < found_pages; i++) { + int process_ret; + + process_ret = process_one_page(mapping, pages[i], + locked_page, page_ops); + if (process_ret < 0) { + for (; i < found_pages; i++) + put_page(pages[i]); + err = -EAGAIN; + goto out; + } + put_page(pages[i]); + pages_processed++; + } + nr_pages -= found_pages; + index += found_pages; + cond_resched(); + } +out: + if (err && processed_end) { + /* + * Update @processed_end. I know this is awful since it has + * two different return value patterns (inclusive vs exclusive). + * + * But the exclusive pattern is necessary if @start is 0, or we + * underflow and check against processed_end won't work as + * expected. + */ + if (pages_processed) + *processed_end = min(end, + ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); + else + *processed_end = start; + } + return err; +} static noinline void __unlock_for_delalloc(struct inode *inode, struct page *locked_page, @@ -1939,102 +2047,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, return found; } -static int __process_pages_contig(struct address_space *mapping, - struct page *locked_page, - u64 start, u64 end, unsigned long page_ops, - u64 *processed_end) -{ - pgoff_t start_index = start >> PAGE_SHIFT; - pgoff_t end_index = end >> PAGE_SHIFT; - pgoff_t index = start_index; - unsigned long nr_pages = end_index - start_index + 1; - unsigned long pages_processed = 0; - struct page *pages[16]; - unsigned ret; - int err = 0; - int i; - - if (page_ops & PAGE_LOCK) { - ASSERT(page_ops == PAGE_LOCK); - ASSERT(processed_end && *processed_end == start); - } - - if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) - mapping_set_error(mapping, -EIO); - - while (nr_pages > 0) { - int found_pages; - - found_pages = find_get_pages_contig(mapping, index, - min_t(unsigned long, - nr_pages, ARRAY_SIZE(pages)), pages); - if (found_pages == 0) { - /* - * Only if we're going to lock these pages, - * can we find nothing at @index. - */ - ASSERT(page_ops & PAGE_LOCK); - err = -EAGAIN; - goto out; - } - - for (i = 0; i < ret; i++) { - if (page_ops & PAGE_SET_ORDERED) - SetPageOrdered(pages[i]); - - if (locked_page && pages[i] == locked_page) { - put_page(pages[i]); - pages_processed++; - continue; - } - if (page_ops & PAGE_START_WRITEBACK) { - clear_page_dirty_for_io(pages[i]); - set_page_writeback(pages[i]); - } - if (page_ops & PAGE_SET_ERROR) - SetPageError(pages[i]); - if (page_ops & PAGE_END_WRITEBACK) - end_page_writeback(pages[i]); - if (page_ops & PAGE_UNLOCK) - unlock_page(pages[i]); - if (page_ops & PAGE_LOCK) { - lock_page(pages[i]); - if (!PageDirty(pages[i]) || - pages[i]->mapping != mapping) { - unlock_page(pages[i]); - for (; i < ret; i++) - put_page(pages[i]); - err = -EAGAIN; - goto out; - } - } - put_page(pages[i]); - pages_processed++; - } - nr_pages -= found_pages; - index += found_pages; - cond_resched(); - } -out: - if (err && processed_end) { - /* - * Update @processed_end. I know this is awful since it has - * two different return value patterns (inclusive vs exclusive). - * - * But the exclusive pattern is necessary if @start is 0, or we - * underflow and check against processed_end won't work as - * expected. - */ - if (pages_processed) - *processed_end = min(end, - ((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1); - else - *processed_end = start; - - } - return err; -} - void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct page *locked_page, u32 clear_bits, unsigned long page_ops) From patchwork Mon May 31 08:50:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB22BC4708F for ; Mon, 31 May 2021 08:51:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B07E361186 for ; Mon, 31 May 2021 08:51:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230480AbhEaIw6 (ORCPT ); Mon, 31 May 2021 04:52:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:40666 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230347AbhEaIw4 (ORCPT ); Mon, 31 May 2021 04:52:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451076; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JP9j8HnUfIRsWheW/HlFksCo1hCsZvfEoApzqE6MwyY=; b=Kn6c66r7jw1u1tjuqFKkU2UdtpXsVvWCil6GAdX36SON/k4Dx8Sb5NQfI1a01cfhSUvUHa +QkV7PcA3h7hkn8yRbc8Fmg0NHLOD/QuXSAqXhwOZd/bgt0+bOqv5xvorMgboUOXHPsaWd eY2FzeMS8w2GWz5JkVmGh/Z6EZBp2NY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5328AB2E9; Mon, 31 May 2021 08:51:16 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 03/30] btrfs: provide btrfs_page_clamp_*() helpers Date: Mon, 31 May 2021 16:50:39 +0800 Message-Id: <20210531085106.259490-4-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In the coming subpage RW supports, there are a lot of page status update calls which need to be converted to subpage compatible version, which needs @start and @len. Some call sites already have such @start/@len and are already in page range, like various endio functions. But there are also call sites which need to clamp the range for subpage case, like btrfs_dirty_pagse() and __process_contig_pages(). Here we introduce new helpers, btrfs_page_clamp_*(), to do and only do the clamp for subpage version. Although in theory all existing btrfs_page_*() calls can be converted to use btrfs_page_clamp_*() directly, but that would make us to do unnecessary clamp operations. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/subpage.c | 38 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 10 ++++++++++ 2 files changed, 48 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 2d19089ab625..a6cf1776f3f9 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -354,6 +354,16 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } +static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) +{ + u64 orig_start = *start; + u32 orig_len = *len; + + *start = max_t(u64, page_offset(page), orig_start); + *len = min_t(u64, page_offset(page) + PAGE_SIZE, + orig_start + orig_len) - *start; +} + /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. @@ -408,6 +418,34 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info, \ if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) \ return test_page_func(page); \ return btrfs_subpage_test_##name(fs_info, page, start, len); \ +} \ +void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { \ + set_page_func(page); \ + return; \ + } \ + btrfs_subpage_clamp_range(page, &start, &len); \ + btrfs_subpage_set_##name(fs_info, page, start, len); \ +} \ +void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { \ + clear_page_func(page); \ + return; \ + } \ + btrfs_subpage_clamp_range(page, &start, &len); \ + btrfs_subpage_clear_##name(fs_info, page, start, len); \ +} \ +bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len) \ +{ \ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) \ + return test_page_func(page); \ + btrfs_subpage_clamp_range(page, &start, &len); \ + return btrfs_subpage_test_##name(fs_info, page, start, len); \ } IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate, PageUptodate); diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index bfd626e955be..291cb1932f27 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -72,6 +72,10 @@ void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, * btrfs_page_*() are for call sites where the page can either be subpage * specific or regular page. The function will handle both cases. * But the range still needs to be inside the page. + * + * btrfs_page_clamp_*() are similar to btrfs_page_*(), except the range doesn't + * need to be inside the page. Those functions will truncate the range + * automatically. */ #define DECLARE_BTRFS_SUBPAGE_OPS(name) \ void btrfs_subpage_set_##name(const struct btrfs_fs_info *fs_info, \ @@ -85,6 +89,12 @@ void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info, \ void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info, \ struct page *page, u64 start, u32 len); \ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \ + struct page *page, u64 start, u32 len); \ +bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info, \ struct page *page, u64 start, u32 len); DECLARE_BTRFS_SUBPAGE_OPS(uptodate); From patchwork Mon May 31 08:50:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D3ECC47094 for ; Mon, 31 May 2021 08:51:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3EB0A60FEE for ; Mon, 31 May 2021 08:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230487AbhEaIw7 (ORCPT ); Mon, 31 May 2021 04:52:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:40690 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230478AbhEaIw6 (ORCPT ); Mon, 31 May 2021 04:52:58 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451078; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LEomdVu04kBShrSjhMnNnBFZj1xhRDJa8qxyzsQKZdo=; b=mUgLHzWP+6LMrFvdMjIQUBScOTQKJLX1cpk1qhrlSrZT6JNIAt6Z/SwvhPeP54vuCKMlAO 47hEozGrsbn5gcbpEZ3ds/T32KqfODYzYuETzx8BtsSyR4FFxOlUq+v2X8bQlxrOxJ9+YM 5And4hu8d/pnS30i4Ueu6G8IZK2ciOE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 08C00B31C for ; Mon, 31 May 2021 08:51:18 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 04/30] btrfs: only require sector size alignment for end_bio_extent_writepage() Date: Mon, 31 May 2021 16:50:40 +0800 Message-Id: <20210531085106.259490-5-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just like read page, for subpage support we only require sector size alignment. So change the error message condition to only require sector alignment. This should not affect existing code, as for regular sectorsize == PAGE_SIZE case, we are still requiring page alignment. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 201e68d858ea..9ad8bc454107 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2824,25 +2824,20 @@ static void end_bio_extent_writepage(struct bio *bio) struct page *page = bvec->bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + const u32 sectorsize = fs_info->sectorsize; - /* We always issue full-page reads, but if some block - * in a page fails to read, blk_update_request() will - * advance bv_offset and adjust bv_len to compensate. - * Print a warning for nonzero offsets, and an error - * if they don't add up to a full page. */ - if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) { - if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE) - btrfs_err(fs_info, - "partial page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else - btrfs_info(fs_info, - "incomplete page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - } + /* Btrfs read write should always be sector aligned. */ + if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + btrfs_err(fs_info, + "partial page write in btrfs with offset %u and length %u", + bvec->bv_offset, bvec->bv_len); + else if (!IS_ALIGNED(bvec->bv_len, sectorsize)) + btrfs_info(fs_info, + "incomplete page write with offset %u and length %u", + bvec->bv_offset, bvec->bv_len); - start = page_offset(page); - end = start + bvec->bv_offset + bvec->bv_len - 1; + start = page_offset(page) + bvec->bv_offset; + end = start + bvec->bv_len - 1; if (first_bvec) { btrfs_record_physical_zoned(inode, start, bio); From patchwork Mon May 31 08:50:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C42BBC47095 for ; Mon, 31 May 2021 08:51:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A0E6161186 for ; Mon, 31 May 2021 08:51:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230498AbhEaIxA (ORCPT ); Mon, 31 May 2021 04:53:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:40710 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230494AbhEaIxA (ORCPT ); Mon, 31 May 2021 04:53:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451080; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYVeus6owo8iEUG8RcD8NEjpyF6B+CbULJO+ll+Mzhw=; b=rf1yBMkMy2XQBvndU/u9RlNkDlgUYWVCyjxaDhWLTRHW2c3nz2yHlW6QlCTGWw7biwp4Cj m32IabPk0/K+6foa/BXDNDcI7XJJ3XHNttbIef3YOyGLfbLNdwwees9WiHY84nBTm7fRjd xWymsAr1viygeECd4dhj1k5C4wLBy3k= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 098B5B2E9; Mon, 31 May 2021 08:51:20 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 05/30] btrfs: make btrfs_dirty_pages() to be subpage compatible Date: Mon, 31 May 2021 16:50:41 +0800 Message-Id: <20210531085106.259490-6-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since the extent io tree operations in btrfs_dirty_pages() are already subpage compatible, we only need to make the page status update to use subpage helpers. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 3b10d98b4ebb..4a89697ae3a7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -28,6 +28,7 @@ #include "compression.h" #include "delalloc-space.h" #include "reflink.h" +#include "subpage.h" static struct kmem_cache *btrfs_inode_defrag_cachep; /* @@ -482,6 +483,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, start_pos = round_down(pos, fs_info->sectorsize); num_bytes = round_up(write_bytes + pos - start_pos, fs_info->sectorsize); + ASSERT(num_bytes <= U32_MAX); end_of_last_block = start_pos + num_bytes - 1; @@ -500,9 +502,10 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, for (i = 0; i < num_pages; i++) { struct page *p = pages[i]; - SetPageUptodate(p); + + btrfs_page_clamp_set_uptodate(fs_info, p, start_pos, num_bytes); ClearPageChecked(p); - set_page_dirty(p); + btrfs_page_clamp_set_dirty(fs_info, p, start_pos, num_bytes); } /* From patchwork Mon May 31 08:50:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03A5EC4708F for ; Mon, 31 May 2021 08:51:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDE8060FEE for ; Mon, 31 May 2021 08:51:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230494AbhEaIxC (ORCPT ); Mon, 31 May 2021 04:53:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:40722 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230500AbhEaIxC (ORCPT ); Mon, 31 May 2021 04:53:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451082; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BODKMvpNVgs9/NKlZJWqkafQBInpZYWws+jesmpBPZE=; b=bK3KiodkVzzCS3bXQ59UJextu2vfcldc7JFGcG1mOnQVZ/ZT14AWT1D4WetY5BTxcJFx0d H614o6N2QswnzVHwYTaET4S8DWkoi59pvTamWjBRhwIr7NRq1ODgYzo/aG1saREcOhCIge 1BsCXzrRblS6JasBw7W1irAosXz0jaw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0976AB2E9; Mon, 31 May 2021 08:51:22 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 06/30] btrfs: make __process_pages_contig() to handle subpage dirty/error/writeback status Date: Mon, 31 May 2021 16:50:42 +0800 Message-Id: <20210531085106.259490-7-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For __process_pages_contig() and process_one_page(), to handle subpage we only need to pass bytenr in and call subpage helpers to handle dirty/error/writeback status. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9ad8bc454107..64456146ee8e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1816,10 +1816,16 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, * Return -EGAIN if the we need to try again. * (For PAGE_LOCK case but got dirty page or page not belong to mapping) */ -static int process_one_page(struct address_space *mapping, +static int process_one_page(struct btrfs_fs_info *fs_info, + struct address_space *mapping, struct page *page, struct page *locked_page, - unsigned long page_ops) + unsigned long page_ops, u64 start, u64 end) { + u32 len; + + ASSERT(end + 1 - start != 0 && end + 1 - start < U32_MAX); + len = end + 1 - start; + if (page_ops & PAGE_SET_ORDERED) SetPageOrdered(page); @@ -1827,13 +1833,13 @@ static int process_one_page(struct address_space *mapping, return 1; if (page_ops & PAGE_SET_ERROR) - SetPageError(page); + btrfs_page_clamp_set_error(fs_info, page, start, len); if (page_ops & PAGE_START_WRITEBACK) { - clear_page_dirty_for_io(page); - set_page_writeback(page); + btrfs_page_clamp_clear_dirty(fs_info, page, start, len); + btrfs_page_clamp_set_writeback(fs_info, page, start, len); } if (page_ops & PAGE_END_WRITEBACK) - end_page_writeback(page); + btrfs_page_clamp_clear_writeback(fs_info, page, start, len); if (page_ops & PAGE_LOCK) { lock_page(page); if (!PageDirty(page) || page->mapping != mapping) { @@ -1851,6 +1857,7 @@ static int __process_pages_contig(struct address_space *mapping, u64 start, u64 end, unsigned long page_ops, u64 *processed_end) { + struct btrfs_fs_info *fs_info = btrfs_sb(mapping->host->i_sb); pgoff_t start_index = start >> PAGE_SHIFT; pgoff_t end_index = end >> PAGE_SHIFT; pgoff_t index = start_index; @@ -1887,8 +1894,9 @@ static int __process_pages_contig(struct address_space *mapping, for (i = 0; i < found_pages; i++) { int process_ret; - process_ret = process_one_page(mapping, pages[i], - locked_page, page_ops); + process_ret = process_one_page(fs_info, mapping, + pages[i], locked_page, page_ops, + start, end); if (process_ret < 0) { for (; i < found_pages; i++) put_page(pages[i]); From patchwork Mon May 31 08:50:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 646DFC4708F for ; Mon, 31 May 2021 08:51:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 469BD611CA for ; Mon, 31 May 2021 08:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230501AbhEaIxG (ORCPT ); Mon, 31 May 2021 04:53:06 -0400 Received: from mx2.suse.de ([195.135.220.15]:40754 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230500AbhEaIxE (ORCPT ); Mon, 31 May 2021 04:53:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451084; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1mrLhEj35Tu+YsGYcNp98T02yzKKOtTMJWQmX4tqlhg=; b=tSFGnxU1Ee626pIMrrhuJYWJlwaZfUl0/I1ApMoz48cMQV0gIYs3uGS8qKhjfdHrK1Fyqk GjMdGBUBk1Cip/U3H5Sz+wb0Z/lk067xbSt0TtX5dCF1DA10wt6uIW7LZvfBkF6XmKj7Jf TGo4Ypb1d6T+k0I1uNGipQ3aYmMOgHU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0BE28B3E8; Mon, 31 May 2021 08:51:24 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 07/30] btrfs: make end_bio_extent_writepage() to be subpage compatible Date: Mon, 31 May 2021 16:50:43 +0800 Message-Id: <20210531085106.259490-8-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now in end_bio_extent_writepage(), the only subpage incompatible code is the end_page_writeback(). Just call the subpage helpers. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 64456146ee8e..82979cb86876 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2853,7 +2853,8 @@ static void end_bio_extent_writepage(struct bio *bio) } end_extent_writepage(page, error, start, end); - end_page_writeback(page); + + btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len); } bio_put(bio); From patchwork Mon May 31 08:50:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E0FBC47082 for ; Mon, 31 May 2021 08:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CB8461186 for ; Mon, 31 May 2021 08:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230511AbhEaIxJ (ORCPT ); Mon, 31 May 2021 04:53:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:40782 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230462AbhEaIxH (ORCPT ); Mon, 31 May 2021 04:53:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451086; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yGt9vAwZSV/R3uK8kT/gG86fwHSK+btsXeDW5an68xo=; b=R/Em8/Halr0phe31XAeI/d0Or8IS0rX0Fibe3xZH+oIHZKtCxGfvymOH1l3zM2QrPGtbEa PEx+Dlm7Ogc4NmrDRnYknUjtRGyZ22ySyKfb6CMvcrJwO3Rug+br+R2P+RzpX6mzbP45VD dK0QHrjRa6AHn8scan1gamiGKePqqnU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id F25E2B31C; Mon, 31 May 2021 08:51:25 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik Subject: [PATCH v4 08/30] btrfs: make process_one_page() to handle subpage locking Date: Mon, 31 May 2021 16:50:44 +0800 Message-Id: <20210531085106.259490-9-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce a new data inodes specific subpage member, writers, to record how many sectors are under page lock for delalloc writing. This member acts pretty much the same as readers, except it's only for delalloc writes. This is important for delalloc code to trace which page can really be freed, as we have cases like run_delalloc_nocow() where we may exit processing nocow range inside a page, but need to exit to do cow half way. In that case, we need a way to determine if we can really unlock a full page. With the new btrfs_subpage::writers, there is a new requirement: - Page locked by process_one_page() must be unlocked by process_one_page() There are still tons of call sites manually lock and unlock a page, without updating btrfs_subpage::writers. So if we lock a page through process_one_page() then it must be unlocked by process_one_page() to keep btrfs_subpage::writers consistent. This will be handled in next patch. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 10 +++-- fs/btrfs/subpage.c | 89 ++++++++++++++++++++++++++++++++++++++------ fs/btrfs/subpage.h | 10 +++++ 3 files changed, 94 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 82979cb86876..531edee6e988 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1841,14 +1841,18 @@ static int process_one_page(struct btrfs_fs_info *fs_info, if (page_ops & PAGE_END_WRITEBACK) btrfs_page_clamp_clear_writeback(fs_info, page, start, len); if (page_ops & PAGE_LOCK) { - lock_page(page); + int ret; + + ret = btrfs_page_start_writer_lock(fs_info, page, start, len); + if (ret) + return ret; if (!PageDirty(page) || page->mapping != mapping) { - unlock_page(page); + btrfs_page_end_writer_lock(fs_info, page, start, len); return -EAGAIN; } } if (page_ops & PAGE_UNLOCK) - unlock_page(page); + btrfs_page_end_writer_lock(fs_info, page, start, len); return 0; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index a6cf1776f3f9..f728e5009487 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -110,10 +110,12 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info, if (!*ret) return -ENOMEM; spin_lock_init(&(*ret)->lock); - if (type == BTRFS_SUBPAGE_METADATA) + if (type == BTRFS_SUBPAGE_METADATA) { atomic_set(&(*ret)->eb_refs, 0); - else + } else { atomic_set(&(*ret)->readers, 0); + atomic_set(&(*ret)->writers, 0); + } return 0; } @@ -203,6 +205,79 @@ void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, unlock_page(page); } +static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) +{ + u64 orig_start = *start; + u32 orig_len = *len; + + *start = max_t(u64, page_offset(page), orig_start); + *len = min_t(u64, page_offset(page) + PAGE_SIZE, + orig_start + orig_len) - *start; +} + +void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + int nbits = len >> fs_info->sectorsize_bits; + int ret; + + btrfs_subpage_assert(fs_info, page, start, len); + + ASSERT(atomic_read(&subpage->readers) == 0); + ret = atomic_add_return(nbits, &subpage->writers); + ASSERT(ret == nbits); +} + +bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + int nbits = len >> fs_info->sectorsize_bits; + + btrfs_subpage_assert(fs_info, page, start, len); + + ASSERT(atomic_read(&subpage->writers) >= nbits); + return atomic_sub_and_test(nbits, &subpage->writers); +} + +/* + * To lock a page for delalloc page writeback. + * + * Return -EAGAIN if the page is not properly initialized. + * Return 0 with the page locked, and writer counter updated. + * + * Even with 0 returned, the page still need extra check to make sure + * it's really the correct page, as the caller is using + * find_get_pages_contig(), which can race with page invalidating. + */ +int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) { + lock_page(page); + return 0; + } + lock_page(page); + if (!PagePrivate(page) || !page->private) { + unlock_page(page); + return -EAGAIN; + } + btrfs_subpage_clamp_range(page, &start, &len); + btrfs_subpage_start_writer(fs_info, page, start, len); + return 0; +} + +void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) + return unlock_page(page); + btrfs_subpage_clamp_range(page, &start, &len); + if (btrfs_subpage_end_and_test_writer(fs_info, page, start, len)) + unlock_page(page); +} + /* * Convert the [start, start + len) range into a u16 bitmap * @@ -354,16 +429,6 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } -static void btrfs_subpage_clamp_range(struct page *page, u64 *start, u32 *len) -{ - u64 orig_start = *start; - u32 orig_len = *len; - - *start = max_t(u64, page_offset(page), orig_start); - *len = min_t(u64, page_offset(page) + PAGE_SIZE, - orig_start + orig_len) - *start; -} - /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 291cb1932f27..9d087ab3244e 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -33,6 +33,7 @@ struct btrfs_subpage { /* Structures only used by data */ struct { atomic_t readers; + atomic_t writers; }; }; }; @@ -63,6 +64,15 @@ void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); +void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); +void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len); + /* * Template for subpage related operations. * From patchwork Mon May 31 08:50:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ECE9C47082 for ; Mon, 31 May 2021 08:51:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2DD5760FEE for ; Mon, 31 May 2021 08:51:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230476AbhEaIxM (ORCPT ); Mon, 31 May 2021 04:53:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:40842 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230400AbhEaIxI (ORCPT ); Mon, 31 May 2021 04:53:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451087; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SDn4VMTjdSk8A06e4ZuQlZmj6wRQRHg6TgqgNAVFEgU=; b=rnnMF5oED03q++EqS7e/ZAYEWsJQIpH63zaAMonuV2d8+jtZXrVQ+oeCS+FauWzxGZVRGq ip2Em3g0sIP/3m345Ns+j+ZmVdQ+9HwFtZJ9nrphD7TvZ+WohTWsZr5BpUF4sZWRP5spr9 eOT1mTC17ssBSZh5S6AgUCtBRy7lXwE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A868AB3E8 for ; Mon, 31 May 2021 08:51:27 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 09/30] btrfs: introduce helpers for subpage ordered status Date: Mon, 31 May 2021 16:50:45 +0800 Message-Id: <20210531085106.259490-10-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introduces the following functions to handle btrfs subpage ordered (private2) status: - btrfs_subpage_set_ordered() - btrfs_subpage_clear_ordered() - btrfs_subpage_test_ordered() Those helpers can only be called when the range is ensured to be inside the page. - btrfs_page_set_ordered() - btrfs_page_clear_ordered() - btrfs_page_test_ordered() Those helpers can handle both regular sector size and subpage without problem. Those functions are here to coordinate btrfs_invalidatepage() with btrfs_writepage_endio_finish_ordered(), to make sure only one of those functions can finish the ordered extent. Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 4 ++++ 2 files changed, 33 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index f728e5009487..516e0b3f2ed9 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -429,6 +429,32 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); } +void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + subpage->ordered_bitmap |= tmp; + SetPageOrdered(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} + +void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info, + struct page *page, u64 start, u32 len) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); + unsigned long flags; + + spin_lock_irqsave(&subpage->lock, flags); + subpage->ordered_bitmap &= ~tmp; + if (subpage->ordered_bitmap == 0) + ClearPageOrdered(page); + spin_unlock_irqrestore(&subpage->lock, flags); +} /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. @@ -451,6 +477,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback); +IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered); /* * Note that, in selftests (extent-io-tests), we can have empty fs_info passed @@ -519,3 +546,5 @@ IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io, PageDirty); IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback, PageWriteback); +IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered, + PageOrdered); diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 9d087ab3244e..3419b152c00f 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -34,6 +34,9 @@ struct btrfs_subpage { struct { atomic_t readers; atomic_t writers; + + /* If a sector has pending ordered extent */ + u16 ordered_bitmap; }; }; }; @@ -111,6 +114,7 @@ DECLARE_BTRFS_SUBPAGE_OPS(uptodate); DECLARE_BTRFS_SUBPAGE_OPS(error); DECLARE_BTRFS_SUBPAGE_OPS(dirty); DECLARE_BTRFS_SUBPAGE_OPS(writeback); +DECLARE_BTRFS_SUBPAGE_OPS(ordered); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); From patchwork Mon May 31 08:50:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 880ABC47094 for ; Mon, 31 May 2021 08:51:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69CE460FF2 for ; Mon, 31 May 2021 08:51:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230428AbhEaIxR (ORCPT ); Mon, 31 May 2021 04:53:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:40878 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230509AbhEaIxJ (ORCPT ); Mon, 31 May 2021 04:53:09 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451089; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TDrvP7Rf8C19HqpseUuyPpbDwoHzaafvNkFpl0YGBUc=; b=odCZjmJ073ddnR3S5eooOiIIkpzRbBcyTb87jCBzRI43eDTcjUQILHbGEm6FlJ/tBvacOR I52eLs9P+mIuyoo0CG5tPUYqV4So+QxSsUwts/PsrAKEpAeHN9BkYgbXrC2WNNCFkswe5g AzsZo/Vne/h7PEx7RGtxcE7ehQyUkWE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 546BFB2E9 for ; Mon, 31 May 2021 08:51:29 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 10/30] btrfs: make page Ordered bit to be subpage compatible Date: Mon, 31 May 2021 16:50:46 +0800 Message-Id: <20210531085106.259490-11-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This involves the following modication: - Ordered extent creation This is done in process_one_page(), now PAGE_SET_ORDERED will call subpage helper to do the work. - endio functions This is done in btrfs_mark_ordered_io_finished(). - btrfs_invalidatepage() - btrfs_cleanup_ordered_extents() Use the subpage page helper, and add an extra branch to exit if the locked page have covered the full range. Now the usage of page Ordered flag for ordered extent accounting is fully subpage compatible. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 19 ++++++++++++++----- fs/btrfs/ordered-data.c | 5 +++-- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 531edee6e988..85effc6e4e91 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1827,7 +1827,7 @@ static int process_one_page(struct btrfs_fs_info *fs_info, len = end + 1 - start; if (page_ops & PAGE_SET_ORDERED) - SetPageOrdered(page); + btrfs_page_clamp_set_ordered(fs_info, page, start, len); if (page == locked_page) return 1; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ee1e923cb870..f14afce03759 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -51,6 +51,7 @@ #include "block-group.h" #include "space-info.h" #include "zoned.h" +#include "subpage.h" struct btrfs_iget_args { u64 ino; @@ -191,18 +192,22 @@ static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode, * range, then __endio_write_update_ordered() will handle * the ordered extent accounting for the range. */ - ClearPageOrdered(page); + btrfs_page_clamp_clear_ordered(inode->root->fs_info, page, + offset, bytes); put_page(page); } + /* The locked page covers the full range, nothing needs to be done */ + if (bytes + offset <= page_offset(locked_page) + PAGE_SIZE) + return; /* * In case this page belongs to the delalloc range being instantiated * then skip it, since the first page of a range is going to be * properly cleaned up by the caller of run_delalloc_range */ if (page_start >= offset && page_end <= (offset + bytes - 1)) { - offset += PAGE_SIZE; - bytes -= PAGE_SIZE; + bytes = offset + bytes - page_offset(locked_page) - PAGE_SIZE; + offset = page_offset(locked_page) + PAGE_SIZE; } return __endio_write_update_ordered(inode, offset, bytes, false); @@ -8320,6 +8325,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length) { struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct extent_io_tree *tree = &inode->io_tree; struct extent_state *cached_state = NULL; u64 page_start = page_offset(page); @@ -8355,6 +8361,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, struct btrfs_ordered_extent *ordered; bool delete_states; u64 range_end; + u32 range_len; ordered = btrfs_lookup_first_ordered_range(inode, cur, page_end + 1 - cur); @@ -8381,7 +8388,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, range_end = min(ordered->file_offset + ordered->num_bytes - 1, page_end); - if (!PageOrdered(page)) { + ASSERT(range_end + 1 - cur < U32_MAX); + range_len = range_end + 1 - cur; + if (!btrfs_page_test_ordered(fs_info, page, cur, range_len)) { /* * If Ordered (Private2) is cleared, it means endio has * already been executed for the range. @@ -8391,7 +8400,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, delete_states = false; goto next; } - ClearPageOrdered(page); + btrfs_page_clear_ordered(fs_info, page, cur, range_len); /* * IO on this page will never be started, so we need to account diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index b1b377ad99a0..6eb41b7c0c84 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -16,6 +16,7 @@ #include "compression.h" #include "delalloc-space.h" #include "qgroup.h" +#include "subpage.h" static struct kmem_cache *btrfs_ordered_extent_cache; @@ -395,11 +396,11 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode, * * If there's no such bit, we need to skip to next range. */ - if (!PageOrdered(page)) { + if (!btrfs_page_test_ordered(fs_info, page, cur, len)) { cur += len; continue; } - ClearPageOrdered(page); + btrfs_page_clear_ordered(fs_info, page, cur, len); } /* Now we're fine to update the accounting */ From patchwork Mon May 31 08:50:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98B10C4708F for ; Mon, 31 May 2021 08:51:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 681ED60FEB for ; Mon, 31 May 2021 08:51:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230479AbhEaIxS (ORCPT ); Mon, 31 May 2021 04:53:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:40910 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230518AbhEaIxM (ORCPT ); Mon, 31 May 2021 04:53:12 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451091; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E3MX/q3MqgwWGTHihb6PseJZ5Hn5F7kWQROKno8fZ7M=; b=TdH6FVM5DL4/LBc/1dMt9dPxi46BR6soMhMb7UjnY2NS45+YrjO4LMnsd7mdH8fv3VwhPg 8NJQ02+YBBWPPOYE9kjFxqwtRJfErUxLA3GokLRQt8RTqh1EtJiYCBsdwL07TXmXrQVH6C NPLJN7+w/Byxm9J+gTDxmlsNCunG3Ns= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 54768B2E9; Mon, 31 May 2021 08:51:31 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH v4 11/30] btrfs: update locked page dirty/writeback/error bits in __process_pages_contig Date: Mon, 31 May 2021 16:50:47 +0800 Message-Id: <20210531085106.259490-12-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When __process_pages_contig() gets called for extent_clear_unlock_delalloc(), if we hit the locked page, only Private2 bit is updated, but dirty/writeback/error bits are all skipped. There are several call sites that call extent_clear_unlock_delalloc() with locked_page and PAGE_CLEAR_DIRTY/PAGE_SET_WRITEBACK/PAGE_END_WRITEBACK - cow_file_range() - run_delalloc_nocow() - cow_file_range_async() All for their error handling branches. For those call sites, since we skip the locked page for dirty/error/writeback bit update, the locked page will still have its subpage dirty bit remaining. Normally it's the call sites which locked the page to handle the locked page, but it won't hurt if we also do the update. Especially there are already other call sites doing the same thing by manually passing NULL as locked_page. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 85effc6e4e91..00f62f0f866e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1828,10 +1828,6 @@ static int process_one_page(struct btrfs_fs_info *fs_info, if (page_ops & PAGE_SET_ORDERED) btrfs_page_clamp_set_ordered(fs_info, page, start, len); - - if (page == locked_page) - return 1; - if (page_ops & PAGE_SET_ERROR) btrfs_page_clamp_set_error(fs_info, page, start, len); if (page_ops & PAGE_START_WRITEBACK) { @@ -1840,6 +1836,10 @@ static int process_one_page(struct btrfs_fs_info *fs_info, } if (page_ops & PAGE_END_WRITEBACK) btrfs_page_clamp_clear_writeback(fs_info, page, start, len); + + if (page == locked_page) + return 1; + if (page_ops & PAGE_LOCK) { int ret; From patchwork Mon May 31 08:50:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97255C4708F for ; Mon, 31 May 2021 08:51:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65E8960FEB for ; Mon, 31 May 2021 08:51:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230520AbhEaIxT (ORCPT ); Mon, 31 May 2021 04:53:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:40936 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230461AbhEaIxN (ORCPT ); Mon, 31 May 2021 04:53:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451093; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZGys6EFh+61H7u2A2fZNxcnoqLkdLsuF0BYGfK6Un+o=; b=fkV0tiy6Pt83b9GLmYxeoKCRNTmhQ90yEm++jRspqfruscleYnDJmhXkuRqoAed1q4g399 EArNS/+uC6KhPGwrpeYf5+nAXsxs2+zF1vqmA8A96djjYej+dejUxaJEh5IdCtiuvOZXWn Czg2cujwfgWnZLRLtsogdYRM3KDeinQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0B65FB459 for ; Mon, 31 May 2021 08:51:33 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 12/30] btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig() Date: Mon, 31 May 2021 16:50:48 +0800 Message-Id: <20210531085106.259490-13-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In cow_file_range(), after we have succeeded creating an inline extent, we unlock the page with extent_clear_unlock_delalloc() by passing locked_page == NULL. For sectorsize == PAGE_SIZE case, this is just making the page lock and unlock harder to grab. But for incoming subpage case, it can be a big problem. For incoming subpage case, page locking have two entrace: - __process_pages_contig() In that case, we know exactly the range we want to lock (which only requires sector alignment). To handle the subpage requirement, we introduce btrfs_subpage::writers to page::private, and will update it in __process_pages_contig(). - Other directly lock/unlock_page() call sites Those won't touch btrfs_subpage::writers at all. This means, page locked by __process_pages_contig() can only be unlocked by __process_pages_contig(). Thankfully we already have the existing infrastructure in the form of @locked_page in various call sites. Unfortunately, extent_clear_unlock_delalloc() in cow_file_range() after creating an inline extent is the exception. It intentionally call extent_clear_unlock_delalloc() with locked_page == NULL, to also unlock current page (and clear its dirty/writeback bits). To co-operate with incoming subpage modifications, and make the page lock/unlock pair easier to understand, this patch will still call extent_clear_unlock_delalloc() with locked_page, and only unlock the page in __extent_writepage(). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f14afce03759..6be83d30ad43 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1091,7 +1091,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * our outstanding extent for clearing delalloc for this * range. */ - extent_clear_unlock_delalloc(inode, start, end, NULL, + extent_clear_unlock_delalloc(inode, start, end, + locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | @@ -1099,6 +1100,19 @@ static noinline int cow_file_range(struct btrfs_inode *inode, *nr_written = *nr_written + (end - start + PAGE_SIZE) / PAGE_SIZE; *page_started = 1; + /* + * locked_page is locked by the caller of + * writepage_delalloc(), not locked by + * __process_pages_contig(). + * + * We can't let __process_pages_contig() to unlock it, + * as it doesn't have any subpage::writers recorded. + * + * Here we manually unlock the page, since the caller + * can't use page_started to determine if it's an + * inline extent or a compressed extent. + */ + unlock_page(locked_page); goto out; } else if (ret < 0) { goto out_unlock; From patchwork Mon May 31 08:50:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6209FC47094 for ; Mon, 31 May 2021 08:51:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 292A860FEE for ; Mon, 31 May 2021 08:51:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231126AbhEaIxV (ORCPT ); Mon, 31 May 2021 04:53:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:40958 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230512AbhEaIxR (ORCPT ); Mon, 31 May 2021 04:53:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451094; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R0P9eP9VVzvDT5zBbQnUO2984IzvLtOxsrJz+dZKvT0=; b=rw0aoreHrPeaFpThHB8L4WEhiQ99o7LiX3yVjkrRY9Bza7UZiiN8J6vhh7VOHMCr4r4Hhb BDTsyerDKpaUBBuZdIlAm64jPEw3jt+aDzAuF1NIKFWbN9hRaBTu0s/rnJAOF7ycUDUh0J rtnWMLgBNBm4E1GHNEUEQHYOyKs35Xc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BCD0AB3E8 for ; Mon, 31 May 2021 08:51:34 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 13/30] btrfs: make btrfs_set_range_writeback() subpage compatible Date: Mon, 31 May 2021 16:50:49 +0800 Message-Id: <20210531085106.259490-14-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Function btrfs_set_range_writeback() currently just set the page writeback unconditionally. Change it to call the subpage helper so that we can handle both cases well. Since the subpage helpers needs btrfs_info, also change the parameter to accept btrfs_inode. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 3 +-- fs/btrfs/inode.c | 12 ++++++++---- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 677d612dcdcb..45798e68331a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3146,7 +3146,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, unsigned long bio_flags); -void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end); +void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); void btrfs_evict_inode(struct inode *inode); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 00f62f0f866e..439aace2f5e0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3800,7 +3800,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, int *nr_ret) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct extent_io_tree *tree = &inode->io_tree; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; u64 cur = start; @@ -3879,7 +3878,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, continue; } - btrfs_set_range_writeback(tree, cur, cur + iosize - 1); + btrfs_set_range_writeback(inode, cur, cur + iosize - 1); if (!PageWriteback(page)) { btrfs_err(inode->root->fs_info, "page %lu not writeback, cur %llu end %llu", diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6be83d30ad43..7bf6ed4b4530 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10205,17 +10205,21 @@ static int btrfs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir, return ret; } -void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) +void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) { - struct inode *inode = tree->private_data; + struct btrfs_fs_info *fs_info = inode->root->fs_info; unsigned long index = start >> PAGE_SHIFT; unsigned long end_index = end >> PAGE_SHIFT; struct page *page; + u32 len; + ASSERT(end + 1 - start <= U32_MAX); + len = end + 1 - start; while (index <= end_index) { - page = find_get_page(inode->i_mapping, index); + page = find_get_page(inode->vfs_inode.i_mapping, index); ASSERT(page); /* Pages should be in the extent_io_tree */ - set_page_writeback(page); + + btrfs_page_set_writeback(fs_info, page, start, len); put_page(page); index++; } From patchwork Mon May 31 08:50:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ABB2C47082 for ; Mon, 31 May 2021 08:51:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5052C61028 for ; Mon, 31 May 2021 08:51:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231132AbhEaIxX (ORCPT ); Mon, 31 May 2021 04:53:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:41018 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230525AbhEaIxR (ORCPT ); Mon, 31 May 2021 04:53:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451096; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=43PEo+gzBeP5yD9Jt0ObqVMgVbR92jMBAn/svCQBSp8=; b=qnRqzKbVnEzEMZVcGL+1DHgzTWNWRM3+IY33ZBuN3ZJOk2byDdPbgdCfoh8OXYeLjzOI+3 zc8S6T4Kr+Y0278pUcIIxuBIhicDbAw5YT/tzA+H5miKrVpXINW+Frj9REeoU95fee4j2U yrDPF67yXUBBtqX3+KPykOb/PqGwi9o= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6B02DB49B for ; Mon, 31 May 2021 08:51:36 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 14/30] btrfs: make __extent_writepage_io() only submit dirty range for subpage Date: Mon, 31 May 2021 16:50:50 +0800 Message-Id: <20210531085106.259490-15-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org __extent_writepage_io() function originally just iterate through all the extent maps of a page, and submit any regular extents. This is fine for sectorsize == PAGE_SIZE case, as if a page is dirty, we need to submit the only sector contained in the page. But for subpage case, one dirty page can contain several clean sectors with at least one dirty sector. If __extent_writepage_io() still submit all regular extent maps, it can submit data which is already written to disk. And since such already written data won't have corresponding ordered extents, it will trigger a BUG_ON() in btrfs_csum_one_bio(). Change the behavior of __extent_writepage_io() by finding the first dirty byte in the page, and only submit the dirty range other than the full extent. Since we're also here, also modify the following calls to be subpage compatible: - SetPageError() - end_page_writeback() Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 100 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 439aace2f5e0..1909979d41de 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3783,6 +3783,74 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, return 0; } +/* + * To find the first byte we need to write. + * + * For subpage, one page can contain several sectors, and + * __extent_writepage_io() will just grab all extent maps in the page + * range and try to submit all non-inline/non-compressed extents. + * + * This is a big problem for subpage, we shouldn't re-submit already written + * data at all. + * This function will lookup subpage dirty bit to find which range we really + * need to submit. + * + * Return the next dirty range in [@start, @end). + * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE. + */ +static void find_next_dirty_byte(struct btrfs_fs_info *fs_info, + struct page *page, u64 *start, u64 *end) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + u64 orig_start = *start; + u16 dirty_bitmap; + unsigned long flags; + int nbits = (orig_start - page_offset(page)) >> fs_info->sectorsize; + int first_bit_set; + int first_bit_zero; + + /* + * For regular sector size == page size case, since one page only + * contains one sector, we return the page offset directly. + */ + if (fs_info->sectorsize == PAGE_SIZE) { + *start = page_offset(page); + *end = page_offset(page) + PAGE_SIZE; + return; + } + + /* We should have the page locked, but just in case */ + spin_lock_irqsave(&subpage->lock, flags); + dirty_bitmap = subpage->dirty_bitmap; + spin_unlock_irqrestore(&subpage->lock, flags); + + /* Set bits lower than @nbits with 0 */ + dirty_bitmap &= ~((1 << nbits) - 1); + + first_bit_set = ffs(dirty_bitmap); + /* No dirty range found */ + if (first_bit_set == 0) { + *start = page_offset(page) + PAGE_SIZE; + return; + } + + ASSERT(first_bit_set > 0 && first_bit_set <= BTRFS_SUBPAGE_BITMAP_SIZE); + *start = page_offset(page) + (first_bit_set - 1) * fs_info->sectorsize; + + /* Set all bits lower than @nbits to 1 for ffz() */ + dirty_bitmap |= ((1 << nbits) - 1); + + first_bit_zero = ffz(dirty_bitmap); + if (first_bit_zero == 0 || first_bit_zero > BTRFS_SUBPAGE_BITMAP_SIZE) { + *end = page_offset(page) + PAGE_SIZE; + return; + } + ASSERT(first_bit_zero > 0 && + first_bit_zero <= BTRFS_SUBPAGE_BITMAP_SIZE); + *end = page_offset(page) + first_bit_zero * fs_info->sectorsize; + ASSERT(*end > *start); +} + /* * helper for __extent_writepage. This calls the writepage start hooks, * and does the loop to map the page into extents and bios. @@ -3830,6 +3898,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, while (cur <= end) { u64 disk_bytenr; u64 em_end; + u64 dirty_range_start = cur; + u64 dirty_range_end; u32 iosize; if (cur >= i_size) { @@ -3837,9 +3907,17 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, end, 1); break; } + + find_next_dirty_byte(fs_info, page, &dirty_range_start, + &dirty_range_end); + if (cur < dirty_range_start) { + cur = dirty_range_start; + continue; + } + em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1); if (IS_ERR_OR_NULL(em)) { - SetPageError(page); + btrfs_page_set_error(fs_info, page, cur, end - cur + 1); ret = PTR_ERR_OR_ZERO(em); break; } @@ -3854,8 +3932,11 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags); disk_bytenr = em->block_start + extent_offset; - /* Note that em_end from extent_map_end() is exclusive */ - iosize = min(em_end, end + 1) - cur; + /* + * Note that em_end from extent_map_end() and dirty_range_end from + * find_next_dirty_byte() are all exclusive + */ + iosize = min(min(em_end, end + 1), dirty_range_end) - cur; if (btrfs_use_zone_append(inode, em->block_start)) opf = REQ_OP_ZONE_APPEND; @@ -3885,6 +3966,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } + /* + * Although the PageDirty bit is cleared before entering this + * function, subpage dirty bit is not cleared. + * So clear subpage dirty bit here so next time we won't + * submit page for range already written to disk. + */ + btrfs_page_clear_dirty(fs_info, page, cur, iosize); + ret = submit_extent_page(opf | write_flags, wbc, &epd->bio_ctrl, page, disk_bytenr, iosize, @@ -3892,9 +3981,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, end_bio_extent_writepage, 0, 0, false); if (ret) { - SetPageError(page); + btrfs_page_set_error(fs_info, page, cur, iosize); if (PageWriteback(page)) - end_page_writeback(page); + btrfs_page_clear_writeback(fs_info, page, cur, + iosize); } cur += iosize; From patchwork Mon May 31 08:50:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17630C4708F for ; Mon, 31 May 2021 08:51:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED4A060FEB for ; Mon, 31 May 2021 08:51:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230483AbhEaIxY (ORCPT ); Mon, 31 May 2021 04:53:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:41066 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbhEaIxS (ORCPT ); Mon, 31 May 2021 04:53:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451098; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fgWAUgevymMQjo0fJYC778dFvvYwPIBTjm3Jm8ozieE=; b=vNouQWtQDquzvuU7Vib65px+/0CaDHYeYtfU1PrFKIP5ut/QBKTKK+Tqp215dI91b9WirL 8gkspddV2F52F2xMmKqcOiu70fl5NN7toOlB61pare+mHe7o6UvqHkHOIiZJ/ZhRMlfDBn k4++dUflj0W3o0unH8wnGHjjVJrNODo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 2C0F3B2E9 for ; Mon, 31 May 2021 08:51:38 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 15/30] btrfs: make btrfs_truncate_block() to be subpage compatible Date: Mon, 31 May 2021 16:50:51 +0800 Message-Id: <20210531085106.259490-16-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org btrfs_truncate_block() itself is already mostly subpage compatible, the only missing part is the page dirtying code. Currently if we have a sector that needs to be truncated, we set the sector aligned range delalloc, then set the full page dirty. The problem is, current subpage code requires subpage dirty bit to be set, or __extent_writepage_io() won't submit bio, thus leads to ordered extent never to finish. So this patch will make btrfs_truncate_block() to call btrfs_page_set_dirty() helper to replace set_page_dirty() to fix the problem. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7bf6ed4b4530..909aadc098ce 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4946,7 +4946,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, flush_dcache_page(page); } ClearPageChecked(page); - set_page_dirty(page); + btrfs_page_set_dirty(fs_info, page, block_start, + block_end + 1 - block_start); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); if (only_release_metadata) From patchwork Mon May 31 08:50:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DBD3C47094 for ; Mon, 31 May 2021 08:51:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4223C61028 for ; Mon, 31 May 2021 08:51:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231138AbhEaIx2 (ORCPT ); Mon, 31 May 2021 04:53:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:41096 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230521AbhEaIxT (ORCPT ); Mon, 31 May 2021 04:53:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451099; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T+pcEKE8ce+8M721LedDLtEmTln9xHPkzPZZp0ZcXc4=; b=BJAHI07VONLjWmldHnHDsVRKNeFfxd0wdg/kErczE/w8tzlym7jaSv9HwLMhwa+5JdJxPU +bPFIxmBAq4AC3XyfQHxSQGlL0exJGyXWEAenIXBPDAL8G57IpaMn2g3UDqhAhIJl5ZnWV 9/P8BT4VbdFa4lZFklcRUsEHo1010lU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D4CACB3E8 for ; Mon, 31 May 2021 08:51:39 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 16/30] btrfs: make btrfs_page_mkwrite() to be subpage compatible Date: Mon, 31 May 2021 16:50:52 +0800 Message-Id: <20210531085106.259490-17-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Only set_page_dirty() and SetPageUptodate() is not subpage compatible. Convert them to subpage helpers, so that __extent_writepage_io() can submit page content correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 909aadc098ce..e756423a7a70 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8633,8 +8633,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) flush_dcache_page(page); } ClearPageChecked(page); - set_page_dirty(page); - SetPageUptodate(page); + btrfs_page_set_dirty(fs_info, page, page_start, end + 1 - page_start); + btrfs_page_set_uptodate(fs_info, page, page_start, + end + 1 - page_start); btrfs_set_inode_last_sub_trans(BTRFS_I(inode)); From patchwork Mon May 31 08:50:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA36EC47082 for ; Mon, 31 May 2021 08:51:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A661F6108D for ; Mon, 31 May 2021 08:51:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230288AbhEaIxe (ORCPT ); Mon, 31 May 2021 04:53:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:41106 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbhEaIxV (ORCPT ); Mon, 31 May 2021 04:53:21 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451101; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ClZNYoBOMXtCrWCRQM9mHsvGsls0DNr+VpZ8ptGHrgY=; b=GSnlBqvA0pcOSrXUJVcM1AcIg5p1bRXQIbbacsp9fmXJcAn9UzYn7gYYuKB23UHnkLIz3N 4U6xIN5kFLvvcAt0BVcP3CDjw4yeazC4GpptN9wgkOBn0HZVOgJ807tWi51i1ajR8Lti5C N3biFAnHzrVstLaVxvoDd59MKIRI/sE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8B6A4B2E9 for ; Mon, 31 May 2021 08:51:41 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 17/30] btrfs: reflink: make copy_inline_to_page() to be subpage compatible Date: Mon, 31 May 2021 16:50:53 +0800 Message-Id: <20210531085106.259490-18-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The modifications are: - Page copy destination For subpage case, one page can contain multiple sectors, thus we can no longer expect the memcpy_to_page()/btrfs_decompress() to copy data into page offset 0. The correct offset is offset_in_page(file_offset) now, which should handle both regular sectorsize and subpage cases well. - Page status update Now we need to use subpage helper to handle the page status update. Signed-off-by: Qu Wenruo --- fs/btrfs/reflink.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index d434dc78dadf..a6e8b8e67c80 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -7,6 +7,7 @@ #include "delalloc-space.h" #include "reflink.h" #include "transaction.h" +#include "subpage.h" #define BTRFS_MAX_DEDUPE_LEN SZ_16M @@ -52,7 +53,8 @@ static int copy_inline_to_page(struct btrfs_inode *inode, const u64 datal, const u8 comp_type) { - const u64 block_size = btrfs_inode_sectorsize(inode); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + const u32 block_size = fs_info->sectorsize; const u64 range_end = file_offset + block_size - 1; const size_t inline_size = size - btrfs_file_extent_calc_inline_size(0); char *data_start = inline_data + btrfs_file_extent_calc_inline_size(0); @@ -106,10 +108,12 @@ static int copy_inline_to_page(struct btrfs_inode *inode, set_bit(BTRFS_INODE_NO_DELALLOC_FLUSH, &inode->runtime_flags); if (comp_type == BTRFS_COMPRESS_NONE) { - memcpy_to_page(page, 0, data_start, datal); + memcpy_to_page(page, offset_in_page(file_offset), data_start, + datal); flush_dcache_page(page); } else { - ret = btrfs_decompress(comp_type, data_start, page, 0, + ret = btrfs_decompress(comp_type, data_start, page, + offset_in_page(file_offset), inline_size, datal); if (ret) goto out_unlock; @@ -133,9 +137,9 @@ static int copy_inline_to_page(struct btrfs_inode *inode, flush_dcache_page(page); } - SetPageUptodate(page); + btrfs_page_set_uptodate(fs_info, page, file_offset, block_size); ClearPageChecked(page); - set_page_dirty(page); + btrfs_page_set_dirty(fs_info, page, file_offset, block_size); out_unlock: if (page) { unlock_page(page); From patchwork Mon May 31 08:50:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A955DC4708F for ; Mon, 31 May 2021 08:51:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 93C2D61028 for ; Mon, 31 May 2021 08:51:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230521AbhEaIxg (ORCPT ); Mon, 31 May 2021 04:53:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:41120 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231130AbhEaIxX (ORCPT ); Mon, 31 May 2021 04:53:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451103; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wgfYSTk2/YHgwwGTP2QulYQgNhi2O01PKfNWXU3VuTI=; b=ktNIWl0aS+5vMaDGqiWmjQJ50lromlE/zzeVNcVdUZ9tl0HL21+KCCgkFSE/WBu/7sscOa Jj74MEd8scbluvmHdwroDNNHrj2u4GPu/y8iLGKPczKVqTBDo8hQQu0/Zi70I6J/TJxV/F qzkGtrPjrBGFvVXuA+kc6qJewqsZiUQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 2A7B7B3E8 for ; Mon, 31 May 2021 08:51:43 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 18/30] btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range() Date: Mon, 31 May 2021 16:50:54 +0800 Message-Id: <20210531085106.259490-19-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] With current subpage RW support, the following script can hang the fs on with 64K page size. # mkfs.btrfs -f -s 4k $dev # mount $dev -o nospace_cache $mnt # fsstress -w -n 50 -p 1 -s 1607749395 -d $mnt The kernel will do an infinite loop in btrfs_punch_hole_lock_range(). [CAUSE] In btrfs_punch_hole_lock_range() we: - Truncate page cache range - Lock extent io tree - Wait any ordered extents in the range. We exit the loop until we meet all the following conditions: - No ordered extent in the lock range - No page is in the lock range The latter condition has a pitfall, it only works for sector size == PAGE_SIZE case. While can't handle the following subpage case: 0 32K 64K 96K 128K | |///////||//////| || lockstart=32K lockend=96K - 1 In this case, although the range cross 2 pages, truncate_pagecache_range() will invalidate no page at all, but only zero the [32K, 96K) range of the two pages. Thus filemap_range_has_page(32K, 96K-1) will always return true, thus we will never meet the loop exit condition. [FIX] Fix the problem by doing page alignment for the lock range. Function filemap_range_has_page() has already handled lend < lstart case, we only need to round up @lockstart, and round_down @lockend for truncate_pagecache_range(). This modification should not change any thing for sector size == PAGE_SIZE case, as in that case our range is already page aligned. Signed-off-by: Qu Wenruo --- fs/btrfs/file.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4a89697ae3a7..6ef44afa939c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2486,6 +2486,16 @@ static int btrfs_punch_hole_lock_range(struct inode *inode, const u64 lockend, struct extent_state **cached_state) { + /* + * For subpage case, if the range is not at page boundary, we could + * have pages at the leading/tailing part of the range. + * This could lead to dead loop since filemap_range_has_page() + * will always return true. + * So here we need to do extra page alignment for + * filemap_range_has_page(). + */ + u64 page_lockstart = round_up(lockstart, PAGE_SIZE); + u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1; while (1) { struct btrfs_ordered_extent *ordered; int ret; @@ -2506,7 +2516,7 @@ static int btrfs_punch_hole_lock_range(struct inode *inode, (ordered->file_offset + ordered->num_bytes <= lockstart || ordered->file_offset > lockend)) && !filemap_range_has_page(inode->i_mapping, - lockstart, lockend)) { + page_lockstart, page_lockend)) { if (ordered) btrfs_put_ordered_extent(ordered); break; From patchwork Mon May 31 08:50:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F018BC47094 for ; Mon, 31 May 2021 08:51:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D65A66108D for ; Mon, 31 May 2021 08:51:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231161AbhEaIxh (ORCPT ); Mon, 31 May 2021 04:53:37 -0400 Received: from mx2.suse.de ([195.135.220.15]:41160 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231129AbhEaIxY (ORCPT ); Mon, 31 May 2021 04:53:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451104; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wI1ILNHcrHD75g0UQfM3LXMP8WIpfno/tD2x8n8V6IA=; b=KdkDE5dPsz+bwl6uj9P+Ft9jQka6xyAL+fJMC3GXyIh2u4Uxxcuze46EetnQLdHUD/JyTc /0ICbgI46IC3HbrqqW7j1j5FHwY3lgdypJSzLMwysqR3etfZ+1AeqCk2n0MdxNQZMu0794 Zy+W8GYcxsPMIgtXfHIGcMzI9henKyc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B7711B2E9 for ; Mon, 31 May 2021 08:51:44 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 19/30] btrfs: don't clear page extent mapped if we're not invalidating the full page Date: Mon, 31 May 2021 16:50:55 +0800 Message-Id: <20210531085106.259490-20-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] With current btrfs subpage rw support, the following script can lead to fs hang: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt The fs will hang at btrfs_start_ordered_extent(). [CAUSE] In above test case, btrfs_invalidate() will be called with the following parameters: offset = 0 length = 53248 page dirty = 1 subpage dirty bitmap = 0x2000 Since @offset is 0, btrfs_invalidate() will try to invalidate the full page, and finally call clear_page_extent_mapped() which will detach btrfs subpage structure from the page. And since the page no longer has btrfs subpage structure, the subpage dirty bitmap will be cleared, preventing the dirty range from written back, thus no way to wake up the ordered extent. [FIX] Just follow other fses, only to invalidate the page if the range covers the full page. There are cases like truncate_setsize() which can call btrfs_invalidatepage() with offset == 0 and length != 0 for the last page of an inode. Although the old code will still try to invalidate the full page, we are still safe to just wait for ordered extent to finish. So it shouldn't cause extra problems. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e756423a7a70..c3d5abf3df5f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8363,7 +8363,19 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, */ wait_on_page_writeback(page); - if (offset) { + /* + * For subpage case, we have call sites like + * btrfs_punch_hole_lock_range() which passes range not aligned to + * sectorsize. + * If the range doesn't cover the full page, we don't need to and + * shouldn't clear page extent mapped, as page->private can still + * record subpage dirty bits for other part of the range. + * + * For cases where can invalidate the full even the range doesn't + * cover the full page, like invalidating the last page, we're + * still safe to wait for ordered extent to finish. + */ + if (!(offset == 0 && length == PAGE_SIZE)) { btrfs_releasepage(page, GFP_NOFS); return; } From patchwork Mon May 31 08:50:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BB56C47082 for ; Mon, 31 May 2021 08:52:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E96161028 for ; Mon, 31 May 2021 08:52:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbhEaIxk (ORCPT ); Mon, 31 May 2021 04:53:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:41120 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231135AbhEaIx0 (ORCPT ); Mon, 31 May 2021 04:53:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYwd8yKPZq1mQY1tZcf68ClNgr2CaN+mR1mny4d3Gzc=; b=qqtDZ/BbUhfhuo6PkKaXPGEGBT3l2dAkbrfuhrIQVUQp8YaKfVT3ZqzZvBTlMFrNRc65cU wO+6vz6SNPKYjjCz8dEv1alruYz2FNTKLxPkGJSpaBJQUL0IB7lNTAy9L21XimK2Wq0vOx EZeO4rrdmh9BqieMnQGSiSCxrlOAq4g= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 59D80B49A for ; Mon, 31 May 2021 08:51:46 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 20/30] btrfs: extract relocation page read and dirty part into its own function Date: Mon, 31 May 2021 16:50:56 +0800 Message-Id: <20210531085106.259490-21-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In function relocate_file_extent_cluster(), we have a big loop for marking all involved page delalloc. That part is long enough to be contained in one function, so this patch will move that code chunk into a new function, relocate_one_page(). This also provides enough space for later subpage work. Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 199 ++++++++++++++++++++---------------------- 1 file changed, 94 insertions(+), 105 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index b70be2ac2e9e..862fe5247c76 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2885,19 +2885,102 @@ noinline int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); -static int relocate_file_extent_cluster(struct inode *inode, - struct file_extent_cluster *cluster) +static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, + struct file_extent_cluster *cluster, + int *cluster_nr, unsigned long page_index) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + u64 offset = BTRFS_I(inode)->index_cnt; + const unsigned long last_index = (cluster->end - offset) >> PAGE_SHIFT; + gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); + struct page *page; u64 page_start; u64 page_end; + int ret; + + ASSERT(page_index <= last_index); + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), PAGE_SIZE); + if (ret) + return ret; + + page = find_lock_page(inode->i_mapping, page_index); + if (!page) { + page_cache_sync_readahead(inode->i_mapping, ra, NULL, + page_index, last_index + 1 - page_index); + page = find_or_create_page(inode->i_mapping, page_index, mask); + if (!page) { + ret = -ENOMEM; + goto release_delalloc; + } + } + ret = set_page_extent_mapped(page); + if (ret < 0) + goto release_page; + + if (PageReadahead(page)) + page_cache_async_readahead(inode->i_mapping, ra, NULL, page, + page_index, last_index + 1 - page_index); + + if (!PageUptodate(page)) { + btrfs_readpage(NULL, page); + lock_page(page); + if (!PageUptodate(page)) { + ret = -EIO; + goto release_page; + } + } + + page_start = page_offset(page); + page_end = page_start + PAGE_SIZE - 1; + + lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); + + if (*cluster_nr < cluster->nr && + page_start + offset == cluster->boundary[*cluster_nr]) { + set_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end, + EXTENT_BOUNDARY); + (*cluster_nr)++; + } + + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, + 0, NULL); + if (ret) { + clear_extent_bits(&BTRFS_I(inode)->io_tree, page_start, + page_end, EXTENT_LOCKED | EXTENT_BOUNDARY); + goto release_page; + + } + set_page_dirty(page); + + unlock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); + unlock_page(page); + put_page(page); + + btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); + balance_dirty_pages_ratelimited(inode->i_mapping); + btrfs_throttle(fs_info); + if (btrfs_should_cancel_balance(fs_info)) + ret = -ECANCELED; + return ret; + +release_page: + unlock_page(page); + put_page(page); +release_delalloc: + btrfs_delalloc_release_metadata(BTRFS_I(inode), PAGE_SIZE, true); + btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); + return ret; +} + +static int relocate_file_extent_cluster(struct inode *inode, + struct file_extent_cluster *cluster) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 offset = BTRFS_I(inode)->index_cnt; unsigned long index; unsigned long last_index; - struct page *page; struct file_ra_state *ra; - gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); - int nr = 0; + int cluster_nr = 0; int ret = 0; if (!cluster->nr) @@ -2918,109 +3001,15 @@ static int relocate_file_extent_cluster(struct inode *inode, if (ret) goto out; - index = (cluster->start - offset) >> PAGE_SHIFT; last_index = (cluster->end - offset) >> PAGE_SHIFT; - while (index <= last_index) { - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - PAGE_SIZE); - if (ret) - goto out; - - page = find_lock_page(inode->i_mapping, index); - if (!page) { - page_cache_sync_readahead(inode->i_mapping, - ra, NULL, index, - last_index + 1 - index); - page = find_or_create_page(inode->i_mapping, index, - mask); - if (!page) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - ret = -ENOMEM; - goto out; - } - } - ret = set_page_extent_mapped(page); - if (ret < 0) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - unlock_page(page); - put_page(page); - goto out; - } - - if (PageReadahead(page)) { - page_cache_async_readahead(inode->i_mapping, - ra, NULL, page, index, - last_index + 1 - index); - } - - if (!PageUptodate(page)) { - btrfs_readpage(NULL, page); - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - put_page(page); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - ret = -EIO; - goto out; - } - } - - page_start = page_offset(page); - page_end = page_start + PAGE_SIZE - 1; - - lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); - - if (nr < cluster->nr && - page_start + offset == cluster->boundary[nr]) { - set_extent_bits(&BTRFS_I(inode)->io_tree, - page_start, page_end, - EXTENT_BOUNDARY); - nr++; - } - - ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, - page_end, 0, NULL); - if (ret) { - unlock_page(page); - put_page(page); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - - clear_extent_bits(&BTRFS_I(inode)->io_tree, - page_start, page_end, - EXTENT_LOCKED | EXTENT_BOUNDARY); - goto out; - - } - set_page_dirty(page); - - unlock_extent(&BTRFS_I(inode)->io_tree, - page_start, page_end); - unlock_page(page); - put_page(page); - - index++; - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - balance_dirty_pages_ratelimited(inode->i_mapping); - btrfs_throttle(fs_info); - if (btrfs_should_cancel_balance(fs_info)) { - ret = -ECANCELED; - goto out; - } - } - WARN_ON(nr != cluster->nr); + for (index = (cluster->start - offset) >> PAGE_SHIFT; + index <= last_index && !ret; index++) + ret = relocate_one_page(inode, ra, cluster, &cluster_nr, + index); if (btrfs_is_zoned(fs_info) && !ret) ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); + if (ret == 0) + WARN_ON(cluster_nr != cluster->nr); out: kfree(ra); return ret; From patchwork Mon May 31 08:50:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C83C4708F for ; Mon, 31 May 2021 08:52:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 605D761028 for ; Mon, 31 May 2021 08:52:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231165AbhEaIxi (ORCPT ); Mon, 31 May 2021 04:53:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:41242 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231137AbhEaIx2 (ORCPT ); Mon, 31 May 2021 04:53:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451108; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zBaOJgPHO4sBkLqccgKDuFVp5DqG3NBbJGQS+eVyN0E=; b=hYXMlVSFt9IRK1FBr47HCjBYEoSh7IMFQFOZaq8zyIN/0US6mQ0RhtLfeaebc2mKMng6Dt mM4spOIKJuQarXBFEjWIato8mX/78WZ5av03e5elDSk2mut18JmrFiOiy6bjLgboXpDt2p pVsI/Uc6O2fvINi1MMOqg1aiJxf1eko= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 11338B3E8 for ; Mon, 31 May 2021 08:51:48 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 21/30] btrfs: make relocate_one_page() to handle subpage case Date: Mon, 31 May 2021 16:50:57 +0800 Message-Id: <20210531085106.259490-22-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage case, one page of data reloc inode can contain several file extents, like this: |<--- File extent A --->| FE B | FE C |<--- File extent D -->| |<--------- Page --------->| We can no longer use PAGE_SIZE directly for various operations. This patch will relocate_one_page() to handle subpage case by: - Iterating through all extents of a cluster when marking pages When marking pages dirty and delalloc, we need to check the cluster extent boundary. Now we introduce a loop to go extent by extent of a page, until we either finished the last extent, or reach the page end. By this, regular sectorsize == PAGE_SIZE can still work as usual, since we will do that loop only once. - Iteration start from max(page_start, extent_start) Since we can have the following case: | FE B | FE C |<--- File extent D -->| |<--------- Page --------->| Thus we can't always start from page_start, but do a max(page_start, extent_start) - Iteration end when the cluster is exhausted Similar to previous case, the last file extent can end before the page end: |<--- File extent A --->| FE B | FE C | |<--------- Page --------->| In this case, we need to manually exit the loop after we have finished the last extent of the cluster. - Reserve metadata space for each extent range Since now we can hit multiple ranges in one page, we should reserve metadata for each range, not simply PAGE_SIZE. Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 108 ++++++++++++++++++++++++++++++------------ 1 file changed, 79 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 862fe5247c76..cd50559c6d17 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -24,6 +24,7 @@ #include "block-group.h" #include "backref.h" #include "misc.h" +#include "subpage.h" /* * Relocation overview @@ -2885,6 +2886,17 @@ noinline int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); +static u64 get_cluster_boundary_end(struct file_extent_cluster *cluster, + int cluster_nr) +{ + /* Last extent, use cluster end directly */ + if (cluster_nr >= cluster->nr - 1) + return cluster->end; + + /* Use next boundary start*/ + return cluster->boundary[cluster_nr + 1] - 1; +} + static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, struct file_extent_cluster *cluster, int *cluster_nr, unsigned long page_index) @@ -2896,22 +2908,17 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, struct page *page; u64 page_start; u64 page_end; + u64 cur; int ret; ASSERT(page_index <= last_index); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), PAGE_SIZE); - if (ret) - return ret; - page = find_lock_page(inode->i_mapping, page_index); if (!page) { page_cache_sync_readahead(inode->i_mapping, ra, NULL, page_index, last_index + 1 - page_index); page = find_or_create_page(inode->i_mapping, page_index, mask); - if (!page) { - ret = -ENOMEM; - goto release_delalloc; - } + if (!page) + return -ENOMEM; } ret = set_page_extent_mapped(page); if (ret < 0) @@ -2933,30 +2940,76 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1; - lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); - - if (*cluster_nr < cluster->nr && - page_start + offset == cluster->boundary[*cluster_nr]) { - set_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end, - EXTENT_BOUNDARY); - (*cluster_nr)++; - } + /* + * Start from the cluster, as for subpage case, the cluster can start + * inside the page. + */ + cur = max(page_start, cluster->boundary[*cluster_nr] - offset); + while (cur <= page_end) { + u64 extent_start = cluster->boundary[*cluster_nr] - offset; + u64 extent_end = get_cluster_boundary_end(cluster, + *cluster_nr) - offset; + u64 clamped_start = max(page_start, extent_start); + u64 clamped_end = min(page_end, extent_end); + u32 clamped_len = clamped_end + 1 - clamped_start; + + /* Reserve metadata for this range */ + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), + clamped_len); + if (ret) + goto release_page; - ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, - 0, NULL); - if (ret) { - clear_extent_bits(&BTRFS_I(inode)->io_tree, page_start, - page_end, EXTENT_LOCKED | EXTENT_BOUNDARY); - goto release_page; + /* Mark the range delalloc and dirty for later writeback */ + lock_extent(&BTRFS_I(inode)->io_tree, clamped_start, + clamped_end); + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), clamped_start, + clamped_end, 0, NULL); + if (ret) { + clear_extent_bits(&BTRFS_I(inode)->io_tree, + clamped_start, clamped_end, + EXTENT_LOCKED | EXTENT_BOUNDARY); + btrfs_delalloc_release_metadata(BTRFS_I(inode), + clamped_len, true); + btrfs_delalloc_release_extents(BTRFS_I(inode), + clamped_len); + goto release_page; + } + btrfs_page_set_dirty(fs_info, page, clamped_start, clamped_len); + /* + * Set the boundary if it's inside the page. + * Data relocation requires the destination extents have the + * same size as the source. + * EXTENT_BOUNDARY bit prevent current extent from being merged + * with previous extent. + */ + if (in_range(cluster->boundary[*cluster_nr] - offset, + page_start, PAGE_SIZE)) { + u64 boundary_start = cluster->boundary[*cluster_nr] - + offset; + u64 boundary_end = boundary_start + + fs_info->sectorsize - 1; + + set_extent_bits(&BTRFS_I(inode)->io_tree, + boundary_start, boundary_end, + EXTENT_BOUNDARY); + } + unlock_extent(&BTRFS_I(inode)->io_tree, clamped_start, + clamped_end); + btrfs_delalloc_release_extents(BTRFS_I(inode), clamped_len); + cur += clamped_len; + + /* Crossed extent end, go to next extent */ + if (cur >= extent_end) { + (*cluster_nr)++; + /* Just finished the last extent of the cluster, exit. */ + if (*cluster_nr >= cluster->nr) + break; + } } - set_page_dirty(page); - - unlock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); unlock_page(page); put_page(page); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); balance_dirty_pages_ratelimited(inode->i_mapping); btrfs_throttle(fs_info); if (btrfs_should_cancel_balance(fs_info)) @@ -2966,9 +3019,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, release_page: unlock_page(page); put_page(page); -release_delalloc: - btrfs_delalloc_release_metadata(BTRFS_I(inode), PAGE_SIZE, true); - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); return ret; } From patchwork Mon May 31 08:50:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D901C47094 for ; Mon, 31 May 2021 08:52:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 231BF6108D for ; Mon, 31 May 2021 08:52:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231173AbhEaIxl (ORCPT ); Mon, 31 May 2021 04:53:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:41160 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231143AbhEaIxa (ORCPT ); Mon, 31 May 2021 04:53:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451109; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+fkhdknk6wWaL3p2xFCvn4FuETkHYn4FGCEGTKZfUHM=; b=UajHIUhL9cOAaHKBOF2SUPHbXwAvpXsl7xzsCGy4UDIZxPqYVDnFd1UJC7hyKA7PsBi+KI cscwohjymktRwQOJAHJ63wv/CKIIJgkMyHDMdE5IWU8N5HJYx4ebRL+mkGkrTAUwDOMThn zk5kfw6DdqMAXH1uJfiGAVRX3IvBMsc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 98A04B2E9 for ; Mon, 31 May 2021 08:51:49 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 22/30] btrfs: fix wild subpage writeback which does not have ordered extent. Date: Mon, 31 May 2021 16:50:58 +0800 Message-Id: <20210531085106.259490-23-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running fsstress with subpage RW support, there are random BUG_ON()s triggered with the following trace: kernel BUG at fs/btrfs/file-item.c:667! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3486 Comm: kworker/u13:2 Tainted: G WC O 5.11.0-rc4-custom+ #43 Hardware name: Radxa ROCK Pi 4B (DT) Workqueue: btrfs-worker-high btrfs_work_helper [btrfs] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : btrfs_csum_one_bio+0x420/0x4e0 [btrfs] lr : btrfs_csum_one_bio+0x400/0x4e0 [btrfs] Call trace: btrfs_csum_one_bio+0x420/0x4e0 [btrfs] btrfs_submit_bio_start+0x20/0x30 [btrfs] run_one_async_start+0x28/0x44 [btrfs] btrfs_work_helper+0x128/0x1b4 [btrfs] process_one_work+0x22c/0x430 worker_thread+0x70/0x3a0 kthread+0x13c/0x140 ret_from_fork+0x10/0x30 [CAUSE] Above BUG_ON() means there are some bio range which doesn't have ordered extent, which indeed is worthy a BUG_ON(). Unlike regular sectorsize == PAGE_SIZE case, in subpage we have extra subpage dirty bitmap to record which range is dirty and should be written back. This means, if we submit bio for a subpage range, we do not only need to clear page dirty, but also need to clear subpage dirty bits. In __extent_writepage_io(), we will call btrfs_page_clear_dirty() for any range we submit a bio. But there is loophole, if we hit a range which is beyond isize, we just call btrfs_writepage_endio_finish_ordered() to finish the ordered io, then break out, without clearing the subpage dirty. This means, if we hit above branch, the subpage dirty bits are still there, if other range of the page get dirtied and we need to writeback that page again, we will submit bio for the old range, leaving a wild bio range which doesn't have ordered extent. [FIX] Fix it by always calling btrfs_page_clear_dirty() in __extent_writepage_io(). Also to avoid such problem from happening again, add a new assert, btrfs_page_assert_not_dirty(), to make sure both page dirty and subpage dirty bits are cleared before exiting __extent_writepage_io(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 17 +++++++++++++++++ fs/btrfs/subpage.c | 16 ++++++++++++++++ fs/btrfs/subpage.h | 7 +++++++ 3 files changed, 40 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1909979d41de..631c4f3e9cea 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3905,6 +3905,16 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, if (cur >= i_size) { btrfs_writepage_endio_finish_ordered(inode, page, cur, end, 1); + /* + * This range is beyond isize, thus we don't need to + * bother writing back. + * But we still need to clear the dirty subpage bit, or + * the next time the page get dirtied, we will try to + * writeback the sectors with subpage diryt bits, + * causing writeback without ordered extent. + */ + btrfs_page_clear_dirty(fs_info, page, cur, + end + 1 - cur); break; } @@ -3955,6 +3965,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, else btrfs_writepage_endio_finish_ordered(inode, page, cur, cur + iosize - 1, 1); + btrfs_page_clear_dirty(fs_info, page, cur, iosize); cur += iosize; continue; } @@ -3990,6 +4001,12 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, cur += iosize; nr++; } + /* + * If we finishes without problem, we should not only clear page dirty, + * but also emptied subpage dirty bits + */ + if (!ret) + btrfs_page_assert_not_dirty(fs_info, page); *nr_ret = nr; return ret; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 516e0b3f2ed9..696485ab68a2 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -548,3 +548,19 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback, PageWriteback); IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered, PageOrdered); + +void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct page *page) +{ + struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; + + if (!IS_ENABLED(CONFIG_BTRFS_ASSERT)) + return; + + ASSERT(!PageDirty(page)); + if (fs_info->sectorsize == PAGE_SIZE) + return; + + ASSERT(PagePrivate(page) && page->private); + ASSERT(subpage->dirty_bitmap == 0); +} diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 3419b152c00f..7188e9d2fbea 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -119,4 +119,11 @@ DECLARE_BTRFS_SUBPAGE_OPS(ordered); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct page *page, u64 start, u32 len); +/* + * Extra assert to make sure not only the page dirty bit is cleared, but also + * subpage dirty bit is cleared. + */ +void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct page *page); + #endif From patchwork Mon May 31 08:50:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48F78C4708F for ; Mon, 31 May 2021 08:52:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2EEBC6108D for ; Mon, 31 May 2021 08:52:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231143AbhEaIxn (ORCPT ); Mon, 31 May 2021 04:53:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:41120 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231144AbhEaIxd (ORCPT ); Mon, 31 May 2021 04:53:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451111; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cVaNLou8UVivLlRM7Y3ye5XLUhUEsSKx1EITu9AaAh0=; b=fl9INgsARKiVC5/xCj7I5+MtreRfphU+BskjtFThF7DfTHaG0geSSwfT0ZwSqQMytNwyyo +TecZeLD21stOgLn7uX++dNJnsj7+tcD9pMLFJ5zxjko58rRNV00pJwzGffuGGccGOFteL uimj++sroIRMATCdbPxPQT5KVzCAobg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 30555B49A for ; Mon, 31 May 2021 08:51:51 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 23/30] btrfs: disable inline extent creation for subpage Date: Mon, 31 May 2021 16:50:59 +0800 Message-Id: <20210531085106.259490-24-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running the following fsx command (extracted from generic/127) on subpage btrfs, it can create inline extent with regular extents: fsx -q -l 262144 -o 65536 -S 191110531 -N 9057 -R -W $mnt/file > /tmp/fsx The offending extent would look like: item 9 key (257 INODE_REF 256) itemoff 15703 itemsize 14 index 2 namelen 4 name: file item 10 key (257 EXTENT_DATA 0) itemoff 14975 itemsize 728 generation 7 type 0 (inline) inline extent data size 707 ram_bytes 707 compression 0 (none) item 11 key (257 EXTENT_DATA 4096) itemoff 14922 itemsize 53 generation 7 type 2 (prealloc) prealloc data disk byte 102346752 nr 4096 prealloc data offset 0 nr 4096 [CAUSE] For subpage btrfs, the writeback is triggered in page unit, which means, even if we just want to writeback range [16K, 20K) for 64K page system, we will still try to writeback any dirty sector of range [0, 64K). This is never a problem if sectorsize == PAGE_SIZE, but for subpage, this can cause unexpected problems. For above test case, the last several operations from fsx are: 9055 trunc from 0x40000 to 0x2c3 9057 falloc from 0x164c to 0x19d2 (0x386 bytes) In operation 9055, we dirtied sector [0, 4096), then in falloc, we call btrfs_wait_ordered_range(inode, start=4096, len=4096), only expecting to writeback any dirty data in [4096, 8192), but nothing else. Unfortunately, in subpage case, above btrfs_wait_ordered_range() will trigger writeback of the range [0, 64K), which includes the data at [0, 4096). And since at the call site, we haven't yet increased i_size, which is still 707, this means cow_file_range() can insert an inline extent. Resulting above inline + regular extent. [WORKAROUND] I don't really have any good short-term solution yet, as this means all operations that would trigger writeback need to be reviewed for any isize change. So here I choose to disable inline extent creation for subpage case as a workaround. We have done tons of work just to avoid such extent, so I don't to create an exception just for subpage. This only affects inline extent creation, btrfs subpage support has no problem reading existing inline extents at all. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c3d5abf3df5f..4918a7ef273b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -682,7 +682,11 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) } } cont: - if (start == 0) { + /* + * Check cow_file_range() for why we don't even try to create + * inline extent for subpage case. + */ + if (start == 0 && fs_info->sectorsize == PAGE_SIZE) { /* lets try to make an inline extent */ if (ret || total_in < actual_end) { /* we didn't compress the entire range, try @@ -1080,7 +1084,17 @@ static noinline int cow_file_range(struct btrfs_inode *inode, inode_should_defrag(inode, start, end, num_bytes, SZ_64K); - if (start == 0) { + /* + * Due to the page size limit, for subpage we can only trigger the + * writeback for the dirty sectors of page, that means data writeback + * is doing more writeback than what we want. + * + * This is especially unexpected for some call sites like fallocate, + * where we only increase isize after everything is done. + * This means we can trigger inline extent even we didn't want. + * So here we skip inline extent creation completely. + */ + if (start == 0 && fs_info->sectorsize == PAGE_SIZE) { /* lets try to make an inline extent */ ret = cow_file_range_inline(inode, start, end, 0, BTRFS_COMPRESS_NONE, NULL); From patchwork Mon May 31 08:51:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34119C4708F for ; Mon, 31 May 2021 08:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 157FC61028 for ; Mon, 31 May 2021 08:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbhEaIxo (ORCPT ); Mon, 31 May 2021 04:53:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:41322 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231148AbhEaIxd (ORCPT ); Mon, 31 May 2021 04:53:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451113; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DDAdqkhr2f0FOmAS/WMkGNVat/INdWWcBT0NfwYlfL0=; b=gHL0dkVJyxl4eoL4iNqTVRwl47ilwJHMwCLVxB2HQOmz/JcCbVweeJulYMaXAmhsoZ9vpj cL4jSFZAyYyfIZEYIH6wqUlTJyOjntjhPBkb4CcZ9wV2DehlOO/ybhdHF2VXXcqJF7ujT3 jCU0w88OzsgaoCkIVJtzZjGWO6q5Mz0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E96BAB49B for ; Mon, 31 May 2021 08:51:52 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 24/30] btrfs: allow submit_extent_page() to do bio split for subpage Date: Mon, 31 May 2021 16:51:00 +0800 Message-Id: <20210531085106.259490-25-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Current submit_extent_page() just checks if the current page range can be fitted into current bio, and if not, submit then re-add. But this behavior can't handle subpage case at all. For subpage case, the problem is in the page size, 64K, which is also the same size as stripe size. This means, if we can't fit a full 64K into a bio, due to stripe limit, then it won't fit into next bio without crossing stripe either. The proper way to handle it is: - Check how many bytes we can put into current bio - Put as many bytes as possible into current bio first - Submit current bio - Create a new bio - Add the remaining bytes into the new bio Refactor submit_extent_page() so that it does the above iteration. The main loop inside submit_extent_page() will look like this: cur = pg_offset; while (cur < pg_offset + size) { u32 offset = cur - pg_offset; int added; if (!bio_ctrl->bio) { /* Allocate new bio if needed */ } /* Add as many bytes into the bio */ added = btrfs_bio_add_page(); if (added < size - offset) { /* The current bio is full, submit it */ } cur += added; } Also, since we're doing new bio allocation deep inside the main loop, extract that code into a new helper, alloc_new_bio(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 194 ++++++++++++++++++++++++++++++------------- 1 file changed, 134 insertions(+), 60 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 631c4f3e9cea..d6a12f7e36d9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -172,6 +172,8 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num, bio->bi_private = NULL; + /* Caller should ensure the bio has at least some range added */ + ASSERT(bio->bi_iter.bi_size); if (is_data_inode(tree->private_data)) ret = btrfs_submit_data_bio(tree->private_data, bio, mirror_num, bio_flags); @@ -3200,20 +3202,22 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) * @size: portion of page that we want to write * @prev_bio_flags: flags of previous bio to see if we can merge the current one * @bio_flags: flags of the current bio to see if we can merge them - * @return: true if page was added, false otherwise * * Attempt to add a page to bio considering stripe alignment etc. * - * Return true if successfully page added. Otherwise, return false. + * Return >= 0 for the number of bytes added to the bio. + * Can return 0 if the current bio is already at stripe/zone boundary. + * Return <0 for error. */ -static bool btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, - struct page *page, - u64 disk_bytenr, unsigned int size, - unsigned int pg_offset, - unsigned long bio_flags) +static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, + struct page *page, + u64 disk_bytenr, unsigned int size, + unsigned int pg_offset, + unsigned long bio_flags) { struct bio *bio = bio_ctrl->bio; u32 bio_size = bio->bi_iter.bi_size; + u32 real_size; const sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig; int ret; @@ -3222,25 +3226,32 @@ static bool btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, /* The limit should be calculated when bio_ctrl->bio is allocated */ ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary); if (bio_ctrl->bio_flags != bio_flags) - return false; + return 0; if (bio_ctrl->bio_flags & EXTENT_BIO_COMPRESSED) contig = bio->bi_iter.bi_sector == sector; else contig = bio_end_sector(bio) == sector; if (!contig) - return false; + return 0; - if (bio_size + size > bio_ctrl->len_to_oe_boundary || - bio_size + size > bio_ctrl->len_to_stripe_boundary) - return false; + real_size = min(bio_ctrl->len_to_oe_boundary, + bio_ctrl->len_to_stripe_boundary) - bio_size; + real_size = min(real_size, size); + + /* + * If real_size is 0, never call bio_add_*_page(), as even size is 0, + * bio will still execute its endio function on the page! + */ + if (real_size == 0) + return 0; if (bio_op(bio) == REQ_OP_ZONE_APPEND) - ret = bio_add_zone_append_page(bio, page, size, pg_offset); + ret = bio_add_zone_append_page(bio, page, real_size, pg_offset); else - ret = bio_add_page(bio, page, size, pg_offset); + ret = bio_add_page(bio, page, real_size, pg_offset); - return ret == size; + return ret; } static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, @@ -3299,6 +3310,63 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, return 0; } +static int alloc_new_bio(struct btrfs_inode *inode, + struct btrfs_bio_ctrl *bio_ctrl, + struct writeback_control *wbc, + unsigned int opf, + bio_end_io_t end_io_func, + u64 disk_bytenr, u32 offset, + unsigned long bio_flags) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct bio *bio; + int ret; + + /* + * For compressed page range, its disk_bytenr is always + * @disk_bytenr passed in, no matter if we have added + * any range into previous bio. + */ + if (bio_flags & EXTENT_BIO_COMPRESSED) + bio = btrfs_bio_alloc(disk_bytenr); + else + bio = btrfs_bio_alloc(disk_bytenr + offset); + bio_ctrl->bio = bio; + bio_ctrl->bio_flags = bio_flags; + ret = calc_bio_boundaries(bio_ctrl, inode); + if (ret < 0) + goto error; + bio->bi_end_io = end_io_func; + bio->bi_private = &inode->io_tree; + bio->bi_write_hint = inode->vfs_inode.i_write_hint; + bio->bi_opf = opf; + if (wbc) { + struct block_device *bdev; + + bdev = fs_info->fs_devices->latest_bdev; + bio_set_dev(bio, bdev); + wbc_init_bio(wbc, bio); + } + if (btrfs_is_zoned(fs_info) && bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct btrfs_device *device; + + device = btrfs_zoned_get_device(fs_info, disk_bytenr, + fs_info->sectorsize); + if (IS_ERR(device)) { + ret = PTR_ERR(device); + goto error; + } + + btrfs_io_bio(bio)->device = device; + } + return 0; +error: + bio_ctrl->bio = NULL; + bio->bi_status = errno_to_blk_status(ret); + bio_endio(bio); + return ret; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3324,61 +3392,67 @@ static int submit_extent_page(unsigned int opf, bool force_bio_submit) { int ret = 0; - struct bio *bio; - size_t io_size = min_t(size_t, size, PAGE_SIZE); struct btrfs_inode *inode = BTRFS_I(page->mapping->host); - struct extent_io_tree *tree = &inode->io_tree; - struct btrfs_fs_info *fs_info = inode->root->fs_info; + unsigned int cur = pg_offset; ASSERT(bio_ctrl); ASSERT(pg_offset < PAGE_SIZE && size <= PAGE_SIZE && pg_offset + size <= PAGE_SIZE); - if (bio_ctrl->bio) { - bio = bio_ctrl->bio; - if (force_bio_submit || - !btrfs_bio_add_page(bio_ctrl, page, disk_bytenr, io_size, - pg_offset, bio_flags)) { - ret = submit_one_bio(bio, mirror_num, bio_ctrl->bio_flags); - bio_ctrl->bio = NULL; - if (ret < 0) - return ret; - } else { - if (wbc) - wbc_account_cgroup_owner(wbc, page, io_size); - return 0; - } - } - - bio = btrfs_bio_alloc(disk_bytenr); - bio_add_page(bio, page, io_size, pg_offset); - bio->bi_end_io = end_io_func; - bio->bi_private = tree; - bio->bi_write_hint = page->mapping->host->i_write_hint; - bio->bi_opf = opf; - if (wbc) { - struct block_device *bdev; - - bdev = fs_info->fs_devices->latest_bdev; - bio_set_dev(bio, bdev); - wbc_init_bio(wbc, bio); - wbc_account_cgroup_owner(wbc, page, io_size); + if (force_bio_submit && bio_ctrl->bio) { + ret = submit_one_bio(bio_ctrl->bio, mirror_num, + bio_ctrl->bio_flags); + bio_ctrl->bio = NULL; + if (ret < 0) + return ret; } - if (btrfs_is_zoned(fs_info) && bio_op(bio) == REQ_OP_ZONE_APPEND) { - struct btrfs_device *device; - device = btrfs_zoned_get_device(fs_info, disk_bytenr, io_size); - if (IS_ERR(device)) - return PTR_ERR(device); + while (cur < pg_offset + size) { + u32 offset = cur - pg_offset; + int added; - btrfs_io_bio(bio)->device = device; + /* Allocate new bio if needed */ + if (!bio_ctrl->bio) { + ret = alloc_new_bio(inode, bio_ctrl, wbc, opf, + end_io_func, disk_bytenr, offset, + bio_flags); + if (ret < 0) + return ret; + } + /* + * We must go through btrfs_bio_add_page() to ensure each + * page range won't cross various boundaries. + */ + if (bio_flags & EXTENT_BIO_COMPRESSED) + added = btrfs_bio_add_page(bio_ctrl, page, disk_bytenr, + size - offset, pg_offset + offset, + bio_flags); + else + added = btrfs_bio_add_page(bio_ctrl, page, + disk_bytenr + offset, size - offset, + pg_offset + offset, bio_flags); + + /* Metadata page range should never be split */ + if (!is_data_inode(&inode->vfs_inode)) + ASSERT(added == 0 || added == size - offset); + + /* At least we added some page, update the account */ + if (wbc && added) + wbc_account_cgroup_owner(wbc, page, added); + + /* We have reached boundary, submit right now */ + if (added < size - offset) { + /* The bio should contain some page(s) */ + ASSERT(bio_ctrl->bio->bi_iter.bi_size); + ret = submit_one_bio(bio_ctrl->bio, mirror_num, + bio_ctrl->bio_flags); + bio_ctrl->bio = NULL; + if (ret < 0) + return ret; + } + cur += added; } - - bio_ctrl->bio = bio; - bio_ctrl->bio_flags = bio_flags; - ret = calc_bio_boundaries(bio_ctrl, inode); - - return ret; + return 0; } static int attach_extent_buffer_page(struct extent_buffer *eb, From patchwork Mon May 31 08:51:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1143C47094 for ; Mon, 31 May 2021 08:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D57386108D for ; Mon, 31 May 2021 08:52:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230508AbhEaIxq (ORCPT ); Mon, 31 May 2021 04:53:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:41242 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231160AbhEaIxe (ORCPT ); Mon, 31 May 2021 04:53:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451114; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZk8Yg5JB+yxE2ONdvdyKzERDUeHue9odN2RGnbIGYI=; b=Tp8waowxaS9XcffWEMJQaYSSrNd2FOopQ2RNJZ5tFc2HheN8Svjq/sbmt6diwb8XD8zGwS 6qTlfYxigQjrHoPBOEOIdl7jn/sUDheGcFDwNTAn3BVlz1iacLEJnPUSbLhFt44IyiTHOR Koyqs5nO5ly5kXXrEXtLQDrXneONvUI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B01F5B49C for ; Mon, 31 May 2021 08:51:54 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 25/30] btrfs: reject raid5/6 fs for subpage Date: Mon, 31 May 2021 16:51:01 +0800 Message-Id: <20210531085106.259490-26-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Raid5/6 is not only unsafe due to its write-hole problem, but also has tons of hardcoded PAGE_SIZE. So disable it for subpage support for now. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 10 ++++++++++ fs/btrfs/volumes.c | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8c3db9076988..2dd48f4bec8f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3406,6 +3406,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device goto fail_alloc; } } + if (sectorsize != PAGE_SIZE) { + if (btrfs_super_incompat_flags(fs_info->super_copy) & + BTRFS_FEATURE_INCOMPAT_RAID56) { + btrfs_err(fs_info, + "raid5/6 is not yet supported for sector size %u with page size %lu", + sectorsize, PAGE_SIZE); + err = -EINVAL; + goto fail_alloc; + } + } ret = btrfs_init_workqueues(fs_info, fs_devices); if (ret) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 582695cee9d1..1e0829f342f8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3937,11 +3937,19 @@ static inline int validate_convert_profile(struct btrfs_fs_info *fs_info, if (!(bargs->flags & BTRFS_BALANCE_ARGS_CONVERT)) return true; + if (fs_info->sectorsize < PAGE_SIZE && + bargs->target & BTRFS_BLOCK_GROUP_RAID56_MASK) { + btrfs_err(fs_info, + "RAID5/6 is not supported yet for sectorsize %u with page size %lu", + fs_info->sectorsize, PAGE_SIZE); + goto invalid; + } /* Profile is valid and does not have bits outside of the allowed set */ if (alloc_profile_is_valid(bargs->target, 1) && (bargs->target & ~allowed) == 0) return true; +invalid: btrfs_err(fs_info, "balance: invalid convert %s profile %s", type, btrfs_bg_type_to_raid_name(bargs->target)); return false; From patchwork Mon May 31 08:51:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD6DC47095 for ; Mon, 31 May 2021 08:52:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21A516108D for ; Mon, 31 May 2021 08:52:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231160AbhEaIxr (ORCPT ); Mon, 31 May 2021 04:53:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:41394 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231157AbhEaIxh (ORCPT ); Mon, 31 May 2021 04:53:37 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451117; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wgd19IFbKv34gGhzw1O26Kq9wPGkQCnqtHhRzW13qeo=; b=PJ9s5zx3VzTtRZyT9YZ9Hb3JY+tUxxpOtd/XkH003paGG8Te9wcF0hlQbzSkTy8FC7ik39 WKDmnton42t3vuK17roweSJsgeiuEC8dHf7y3DXVm7Xcn30dxZHAXSwj+m2M+1r8HRXa9T 2yEyXIH4ISWRnQE1kdg0x4uflUFR04s= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 46734B2E9; Mon, 31 May 2021 08:51:57 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani , Filipe Manana Subject: [PATCH v4 26/30] btrfs: fix a crash caused by race between prepare_pages() and btrfs_releasepage() Date: Mon, 31 May 2021 16:51:02 +0800 Message-Id: <20210531085106.259490-27-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When running generic/095, there is a high chance to crash with subpage data RW support: assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:171 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3403! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3567 Comm: fio Tainted: G C O 5.12.0-rc7-custom+ #17 Hardware name: Khadas VIM3 (DT) Call trace: assertfail.constprop.0+0x28/0x2c [btrfs] btrfs_subpage_assert+0x80/0xa0 [btrfs] btrfs_subpage_set_uptodate+0x34/0xec [btrfs] btrfs_page_clamp_set_uptodate+0x74/0xa4 [btrfs] btrfs_dirty_pages+0x160/0x270 [btrfs] btrfs_buffered_write+0x444/0x630 [btrfs] btrfs_direct_write+0x1cc/0x2d0 [btrfs] btrfs_file_write_iter+0xc0/0x160 [btrfs] new_sync_write+0xe8/0x180 vfs_write+0x1b4/0x210 ksys_pwrite64+0x7c/0xc0 __arm64_sys_pwrite64+0x24/0x30 el0_svc_common.constprop.0+0x70/0x140 do_el0_svc+0x28/0x90 el0_svc+0x2c/0x54 el0_sync_handler+0x1a8/0x1ac el0_sync+0x170/0x180 Code: f0000160 913be042 913c4000 955444bc (d4210000) ---[ end trace 3fdd39f4cccedd68 ]--- [CAUSE] Although prepare_pages() calls find_or_create_page(), which returns with the page locked, but in later prepare_uptodate_page() calls, we may call btrfs_readpage() which will unlock the page before it returns. This leaves a window where btrfs_releasepage() can sneak in and release the page, clearing page->private and causing above ASSERT(). [FIX] In prepare_uptodate_page(), we should not only check page->mapping, but also PagePrivate() to ensure we are still hold a correct page which has proper fs context setup. Reported-by: Ritesh Harjani Tested-by: Ritesh Harjani Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/file.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 6ef44afa939c..e307fbe398f0 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1341,7 +1341,18 @@ static int prepare_uptodate_page(struct inode *inode, unlock_page(page); return -EIO; } - if (page->mapping != inode->i_mapping) { + + /* + * Since btrfs_readpage() will unlock the page before it + * returns, there is a window where btrfs_releasepage() can + * be called to release the page. + * Here we check both inode mapping and PagePrivate() to + * make sure the page was not released. + * + * The private flag check is essential for subpage as we need + * to store extra bitmap using page->private. + */ + if (page->mapping != inode->i_mapping || !PagePrivate(page)) { unlock_page(page); return -EAGAIN; } From patchwork Mon May 31 08:51:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33B9AC4708F for ; Mon, 31 May 2021 08:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 184D0611AC for ; Mon, 31 May 2021 08:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231186AbhEaIxt (ORCPT ); Mon, 31 May 2021 04:53:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:41414 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231167AbhEaIxj (ORCPT ); Mon, 31 May 2021 04:53:39 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AeiAvREcQeDs6BK4tWGZo5cC3c31w9Ba/tkTpG8NtP0=; b=r5ukLMyShJ2tPQd9IC7KVTIGxjQoMd6vHbw6IOyMwUJ3/umYftt3bsyXm+gm/H4ohBLPqN cNC1wGF6CW9CQwgi1BiIceoatpAhhbWWVIIbwi3ulZe0T9OQfxTMCjcmuAhzStTWJrdLmk EIYPe6rY3E4RqOfaWcCROYlngAOZMwM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3E990B3E8; Mon, 31 May 2021 08:51:59 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani Subject: [PATCH v4 27/30] btrfs: fix a use-after-free bug in writeback subpage helper Date: Mon, 31 May 2021 16:51:03 +0800 Message-Id: <20210531085106.259490-28-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] There is a possible use-after-free bug when running generic/095. BUG: Unable to handle kernel data access on write at 0x6b6b6b6b6b6b725b Faulting instruction address: 0xc000000000283654 c000000000283078 do_raw_spin_unlock+0x88/0x230 c0000000012b1e14 _raw_spin_unlock_irqrestore+0x44/0x90 c000000000a918dc btrfs_subpage_clear_writeback+0xac/0xe0 c0000000009e0458 end_bio_extent_writepage+0x158/0x270 c000000000b6fd14 bio_endio+0x254/0x270 c0000000009fc0f0 btrfs_end_bio+0x1a0/0x200 c000000000b6fd14 bio_endio+0x254/0x270 c000000000b781fc blk_update_request+0x46c/0x670 c000000000b8b394 blk_mq_end_request+0x34/0x1d0 c000000000d82d1c lo_complete_rq+0x11c/0x140 c000000000b880a4 blk_complete_reqs+0x84/0xb0 c0000000012b2ca4 __do_softirq+0x334/0x680 c0000000001dd878 irq_exit+0x148/0x1d0 c000000000016f4c do_IRQ+0x20c/0x240 c000000000009240 hardware_interrupt_common_virt+0x1b0/0x1c0 [CAUSE] There is very small race window like the following in generic/095. Thread 1 | Thread 2 --------------------------------+------------------------------------ end_bio_extent_writepage() | btrfs_releasepage() |- spin_lock_irqsave() | | |- end_page_writeback() | | | | |- if (PageWriteback() ||...) | | |- clear_page_extent_mapped() | | |- kfree(subpage); |- spin_unlock_irqrestore(). The race can also happen between writeback and btrfs_invalidatepage(), although that would be much harder as btrfs_invalidatepage() has much more work to do before the clear_page_extent_mapped() call. [FIX] Here we "wait" for the subapge spinlock to be released before we detach subpage structure. So this patch will introduce a new function, wait_subpage_spinlock(), to do the "wait" by acquiring the spinlock and release it. Since the caller has ensured the page is not dirty nor writeback, and page is already locked, the only way to hold the subpage spinlock is from endio function. Thus we only need to acquire the spinlock to wait for any existing holder. Reported-by: Ritesh Harjani Tested-by: Ritesh Harjani Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++- fs/btrfs/subpage.c | 4 +++- 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4918a7ef273b..d3c67411f683 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8308,11 +8308,48 @@ static void btrfs_readahead(struct readahead_control *rac) extent_readahead(rac); } +/* + * For releasepage() and invalidatepage() we have a race window where + * end_page_writeback() is called but the subpage spinlock is not yet + * released. + * If we continue to release/invalidate the page, we could cause + * use-after-free for subpage spinlock. + * So this function is to spin wait for subpage spinlock. + */ +static void wait_subpage_spinlock(struct page *page) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb); + struct btrfs_subpage *subpage; + + if (fs_info->sectorsize == PAGE_SIZE) + return; + + ASSERT(PagePrivate(page) && page->private); + subpage = (struct btrfs_subpage *)page->private; + + /* + * This may look insane as we just acquire the spinlock and release it, + * without doing anything. + * But we just want to make sure no one is still holding the subpage + * spinlock. + * And since the page is not dirty nor writeback, and we have page + * locked, the only possible way to hold a spinlock is from the endio + * function to clear page writeback. + * + * Here we just acquire the spinlock so that all existing callers + * should exit and we're safe to release/invalidate the page. + */ + spin_lock_irq(&subpage->lock); + spin_unlock_irq(&subpage->lock); +} + static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags) { int ret = try_release_extent_mapping(page, gfp_flags); - if (ret == 1) + if (ret == 1) { + wait_subpage_spinlock(page); clear_page_extent_mapped(page); + } return ret; } @@ -8376,6 +8413,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, * do double ordered extent accounting on the same page. */ wait_on_page_writeback(page); + wait_subpage_spinlock(page); /* * For subpage case, we have call sites like diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 696485ab68a2..552410fba0bd 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -424,8 +424,10 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, spin_lock_irqsave(&subpage->lock, flags); subpage->writeback_bitmap &= ~tmp; - if (subpage->writeback_bitmap == 0) + if (subpage->writeback_bitmap == 0) { + ASSERT(PageWriteback(page)); end_page_writeback(page); + } spin_unlock_irqrestore(&subpage->lock, flags); } From patchwork Mon May 31 08:51:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 402B5C4708F for ; Mon, 31 May 2021 08:52:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22FCB61278 for ; Mon, 31 May 2021 08:52:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231167AbhEaIxv (ORCPT ); Mon, 31 May 2021 04:53:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:41508 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231172AbhEaIxl (ORCPT ); Mon, 31 May 2021 04:53:41 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8MLhAvYuwDU6vws7oow0Ndh/f7C4oFYU0y0qdMnjAq8=; b=Hd9RRWnZyINa71V81jlLloqktoclc5WJIFX25WQUeYHZGoCV5o4r+r/DARo7osn2gQyB/W +wKQfLZXXylq7ri8jM1K1G+Gswk/+RZLpASSVahOwly2SToMPg0SeWtevpSBC0hR/QZtkg 03jKskq7qug6HTT3hdIJTRMS2MJthrk= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EC936B2E9 for ; Mon, 31 May 2021 08:52:00 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 28/30] btrfs: fix a subpage false alert for relocating partial preallocated data extents Date: Mon, 31 May 2021 16:51:04 +0800 Message-Id: <20210531085106.259490-29-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When relocating partial preallocated data extents (part of the preallocated extent is written) for subpage, it can cause the following false alert and make the relocation to fail: BTRFS info (device dm-3): balance: start -d BTRFS info (device dm-3): relocating block group 13631488 flags data BTRFS warning (device dm-3): csum failed root -9 ino 257 off 4096 csum 0x98757625 expected csum 0x00000000 mirror 1 BTRFS error (device dm-3): bdev /dev/mapper/arm_nvme-test errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 BTRFS warning (device dm-3): csum failed root -9 ino 257 off 4096 csum 0x98757625 expected csum 0x00000000 mirror 1 BTRFS error (device dm-3): bdev /dev/mapper/arm_nvme-test errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 BTRFS info (device dm-3): balance: ended with status: -5 The minimal script to reproduce looks like this: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt xfs_io -f -c "falloc 0 8k" $mnt/file xfs_io -f -c "pwrite 0 4k" $mnt/file btrfs balance start -d $mnt [CAUSE] Function btrfs_verify_data_csum() checks if the full range has EXTENT_NODATASUM bit for data reloc inode, if *all* bytes of the range has EXTENT_NODATASUM bit, then it skip the range. This works pretty well for regular sectorsize, as in that case btrfs_verify_data_csum() is called for each sector, thus no problem at all. But for subpage case, btrfs_verify_data_csum() is called on each bvec, which can contain several sectors, and since it checks *all* bytes for EXTENT_NODATASUM bit, if we have some range with csum, then we will continue checking all the sectors. For the preallocated sectors, it doesn't have any csum, thus obviously the csum won't match and cause the false alert. [FIX] Move the EXTENT_NODATASUM check into the main loop, so that we can check each sector for EXTENT_NODATASUM bit for subpage case. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d3c67411f683..85f465328ab2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3188,19 +3188,24 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, if (!root->fs_info->csum_root) return 0; - if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID && - test_range_bit(io_tree, start, end, EXTENT_NODATASUM, 1, NULL)) { - clear_extent_bits(io_tree, start, end, EXTENT_NODATASUM); - return 0; - } - ASSERT(page_offset(page) <= start && end <= page_offset(page) + PAGE_SIZE - 1); for (pg_off = offset_in_page(start); pg_off < offset_in_page(end); pg_off += sectorsize, bio_offset += sectorsize) { + u64 file_offset = pg_off + page_offset(page); int ret; + if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID && + test_range_bit(io_tree, file_offset, + file_offset + sectorsize - 1, + EXTENT_NODATASUM, 1, NULL)) { + /* Skip the range without csum for data reloc inode */ + clear_extent_bits(io_tree, file_offset, + file_offset + sectorsize - 1, + EXTENT_NODATASUM); + continue; + } ret = check_data_csum(inode, io_bio, bio_offset, page, pg_off, page_offset(page) + pg_off); if (ret < 0) { From patchwork Mon May 31 08:51:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 537E0C47096 for ; Mon, 31 May 2021 08:52:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38A456108D for ; Mon, 31 May 2021 08:52:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231202AbhEaIxy (ORCPT ); Mon, 31 May 2021 04:53:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:41558 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231175AbhEaIxn (ORCPT ); Mon, 31 May 2021 04:53:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451123; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dWgfRwbtwCkoqW9q0cP9ufVBzZzsz90Zwg/X0XQ+wbE=; b=QdYW5FA8KQNkymqEiuIgY0WfUlNzLSbdjNDheMgMBpT+ryWqgdp2Pf4T7VbvdZs7DyYP8R IznfTS2G8wBYX/4YFcBXE5Rvoo2WGn2YT0JbTLxxMBuVEYuGxxSdjICjv2giY00riDebVV teVdXFwVAm/WeupZY/SJe3NYghhYBkI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EA524B3E8; Mon, 31 May 2021 08:52:02 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani Subject: [PATCH v4 29/30] btrfs: fix a subpage relocation data corruption Date: Mon, 31 May 2021 16:51:05 +0800 Message-Id: <20210531085106.259490-30-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When using the following script, btrfs will report data corruption after one data balance with subpage support: mkfs.btrfs -f -s 4k $dev mount $dev -o nospace_cache $mnt $fsstress -w -n 8 -s 1620948986 -d $mnt/ -v > /tmp/fsstress sync btrfs balance start -d $mnt btrfs scrub start -B $mnt Similar problem can be easily observed in btrfs/028 test case, there will be tons of balance failure with -EIO. [CAUSE] Above fsstress will result the following data extents layout in extent tree: item 10 key (13631488 EXTENT_ITEM 98304) itemoff 15889 itemsize 82 refs 2 gen 7 flags DATA extent data backref root FS_TREE objectid 259 offset 1339392 count 1 extent data backref root FS_TREE objectid 259 offset 647168 count 1 item 11 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 15865 itemsize 24 block group used 102400 chunk_objectid 256 flags DATA item 12 key (13733888 EXTENT_ITEM 4096) itemoff 15812 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 259 offset 729088 count 1 Then when creating the data reloc inode, the data reloc inode will look like this: 0 32K 64K 96K 100K 104K |<------ Extent A ----->| |<- Ext B ->| Then when we first try to relocate extent A, we setup the data reloc inode with iszie 96K, then read both page [0, 64K) and page [64K, 128K). For page 64K, since the isize is just 96K, we fill range [96K, 128K) with 0 and set it uptodate. Then when we come to extent B, we update isize to 104K, then try to read page [64K, 128K). Then we find the page is already uptodate, so we skip the read. But range [96K, 128K) is filled with 0, not the real data. Then we writeback the data reloc inode to disk, with 0 filling range [96K, 128K), corrupting the content of extent B. The behavior is caused by the fact that we still do full page read for subpage case. The bug won't really happen for regular sectorsize, as one page only contains one sector. [FIX] This patch will fix the problem by invalidating range [isize, PAGE_END] in prealloc_file_extent_cluster(). So that if above example happens, when we preallocate the file extent for extent B, we will clear the uptodate bits for range [96K, 128K), allowing later relocate_one_page() to re-read the needed range. Reported-by: Ritesh Harjani Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index cd50559c6d17..b50ee800993d 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2782,10 +2782,48 @@ static noinline_for_stack int prealloc_file_extent_cluster( u64 num_bytes; int nr; int ret = 0; + u64 isize = i_size_read(&inode->vfs_inode); u64 prealloc_start = cluster->start - offset; u64 prealloc_end = cluster->end - offset; u64 cur_offset = prealloc_start; + /* + * For subpage case, previous isize may not be aligned to PAGE_SIZE. + * This means the range [isize, PAGE_END + 1) is filled with 0 by + * btrfs_do_readpage() call of previously relocated file cluster. + * + * If the current cluster starts in above range, btrfs_do_readpage() + * will skip the read, and relocate_one_page() will later writeback + * the padding 0 as new data, causing data corruption. + * + * Here we have to manually invalidate the range (isize, PAGE_END + 1). + */ + if (!IS_ALIGNED(isize, PAGE_SIZE)) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; + const u32 sectorsize = fs_info->sectorsize; + struct page *page; + + ASSERT(sectorsize < PAGE_SIZE); + ASSERT(IS_ALIGNED(isize, sectorsize)); + + page = find_lock_page(inode->vfs_inode.i_mapping, + isize >> PAGE_SHIFT); + /* + * If page is freed we don't need to do anything then, as + * we will re-read the whole page anyway. + */ + if (page) { + u64 page_end = page_offset(page) + PAGE_SIZE - 1; + + clear_extent_bits(&inode->io_tree, isize, page_end, + EXTENT_UPTODATE); + btrfs_subpage_clear_uptodate(fs_info, page, isize, + page_end + 1 - isize); + unlock_page(page); + put_page(page); + } + } + BUG_ON(cluster->start != cluster->boundary[0]); ret = btrfs_alloc_data_chunk_ondemand(inode, prealloc_end + 1 - prealloc_start); From patchwork Mon May 31 08:51:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12288981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28D2CC47094 for ; Mon, 31 May 2021 08:52:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0A93A6124B for ; Mon, 31 May 2021 08:52:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231209AbhEaIxz (ORCPT ); Mon, 31 May 2021 04:53:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:41612 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231177AbhEaIxp (ORCPT ); Mon, 31 May 2021 04:53:45 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1622451124; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m2/pNN8wxcjrcZnHpRLevhx4DTmJhZDlIlwzJHjKRp4=; b=RZUo2Eq365PfsyV9leebSM216xn9NsxZsjkpBza5GWVdWSuVpt5uI/Ydy1qQwpkXabbFrt Y1f39YfZxgUAjjgdwTcklchIwvCNpVZR/jYoaChZyuPRm6xiYCM4VR/B42H1stC9o5MiXN t6AQnuGHgBQKqt/2/bleksyxSAA5cbo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A88C3B2E9 for ; Mon, 31 May 2021 08:52:04 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 30/30] btrfs: allow read-write for 4K sectorsize on 64K page size systems Date: Mon, 31 May 2021 16:51:06 +0800 Message-Id: <20210531085106.259490-31-wqu@suse.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531085106.259490-1-wqu@suse.com> References: <20210531085106.259490-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since now we support data and metadata read-write for subpage, remove the RO requirement for subpage mount. There are some extra limits though: - For now, subpage RW mount is still considered experimental Thus that mount warning will still be there. - No compression support There are still quite some PAGE_SIZE hard coded and quite some call sites use extent_clear_unlock_delalloc() to unlock locked_page. This will screw up subpage helpers Now for subpage RW mount, no matter whatever mount option or inode attr is set, all write will not be compressed. Although reading compressed data has no problem. - No defrag for subpage case The defrag support for subpage case will come in later patches, which will also rework the defrag workflow. - No inline extent will be created This is mostly due to the fact that filemap_fdatawrite_range() will trigger more write than the range specified. In fallocate calls, this behavior can make us to writeback which can be inlined, before we enlarge the isize. This is a very special corner case, and even current btrfs check won't report error on such inline extent + regular extent. But considering how much effort has been put to prevent such inline + regular, I'd prefer to cut off inline extent completely until we have a good solution. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 13 ++++--------- fs/btrfs/inode.c | 3 +++ fs/btrfs/ioctl.c | 6 ++++++ fs/btrfs/super.c | 7 ------- fs/btrfs/sysfs.c | 5 +++++ 5 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2dd48f4bec8f..7c17cb7cf4fe 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3396,15 +3396,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device goto fail_alloc; } - /* For 4K sector size support, it's only read-only */ - if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) { - if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) { - btrfs_err(fs_info, - "subpage sectorsize %u only supported read-only for page size %lu", - sectorsize, PAGE_SIZE); - err = -EINVAL; - goto fail_alloc; - } + if (sectorsize != PAGE_SIZE) { + btrfs_warn(fs_info, + "read-write for sector size %u with page size %lu is experimental", + sectorsize, PAGE_SIZE); } if (sectorsize != PAGE_SIZE) { if (btrfs_super_incompat_flags(fs_info->super_copy) & diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 85f465328ab2..27d56a77aa5f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -490,6 +490,9 @@ static noinline int add_async_extent(struct async_chunk *cow, */ static inline bool inode_can_compress(struct btrfs_inode *inode) { + /* Subpage doesn't support compress yet */ + if (inode->root->fs_info->sectorsize < PAGE_SIZE) + return false; if (inode->flags & BTRFS_INODE_NODATACOW || inode->flags & BTRFS_INODE_NODATASUM) return false; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 49cfa9772c1b..11adf4670c55 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3028,6 +3028,12 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) goto out; } + /* Subpage defrag will be supported in later commits */ + if (root->fs_info->sectorsize < PAGE_SIZE) { + ret = -ENOTTY; + goto out; + } + switch (inode->i_mode & S_IFMT) { case S_IFDIR: if (!capable(CAP_SYS_ADMIN)) { diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4a396c1147f1..b18d268abfbb 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2053,13 +2053,6 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) ret = -EINVAL; goto restore; } - if (fs_info->sectorsize < PAGE_SIZE) { - btrfs_warn(fs_info, - "read-write mount is not yet allowed for sectorsize %u page size %lu", - fs_info->sectorsize, PAGE_SIZE); - ret = -EINVAL; - goto restore; - } /* * NOTE: when remounting with a change that does writes, don't diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 436ac7b4b334..752461a79364 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -366,6 +366,11 @@ static ssize_t supported_sectorsizes_show(struct kobject *kobj, { ssize_t ret = 0; + /* 4K sector size is also support with 64K page size */ + if (PAGE_SIZE == SZ_64K) + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u ", + SZ_4K); + /* Only sectorsize == PAGE_SIZE is now supported */ ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);