From patchwork Tue Nov 3 13:30:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAE03C2D0A3 for ; Tue, 3 Nov 2020 13:31:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63F0920786 for ; Tue, 3 Nov 2020 13:31:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="ZUcvzcRe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729239AbgKCNbU (ORCPT ); Tue, 3 Nov 2020 08:31:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:44028 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729043AbgKCNbU (ORCPT ); Tue, 3 Nov 2020 08:31:20 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=52PEVHe5NA9nd1O3WTbEpQ+2Nwobq4iBy49nMvJqtCs=; b=ZUcvzcRe5YLv4/qoQ8kK8BawGcoKqVw9QxQXL+XpDwoJwEYnfCNSSXuIfjcFUaAacFfTen poqpxwlWxweDqeKW7Y7D3u6Stam9sai/WVaQf/zuCX8jBQMmR5+oDIFsoFK9EUqWkyI3ix e+aTJ6Q55oDpOqDx8i3zWwGohOpXYdU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 743C4ACC0; Tue, 3 Nov 2020 13:31:18 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 01/32] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Date: Tue, 3 Nov 2020 21:30:37 +0800 Message-Id: <20201103133108.148112-2-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In end_bio_extent_readpage() we had a strange dance around extent_start/extent_len. Hides behind the strange dance is, it's just calling endio_readpage_release_extent() on each bvec range. Here is an example to explain the original work flow: Bio is for inode 257, containing 2 pages, for range [1M, 1M+8K) end_bio_extent_extent_readpage() entered |- extent_start = 0; |- extent_end = 0; |- bio_for_each_segment_all() { | |- /* Got the 1st bvec */ | |- start = SZ_1M; | |- end = SZ_1M + SZ_4K - 1; | |- update = 1; | |- if (extent_len == 0) { | | |- extent_start = start; /* SZ_1M */ | | |- extent_len = end + 1 - start; /* SZ_1M */ | | } | | | |- /* Got the 2nd bvec */ | |- start = SZ_1M + 4K; | |- end = SZ_1M + 4K - 1; | |- update = 1; | |- if (extent_start + extent_len == start) { | | |- extent_len += end + 1 - start; /* SZ_8K */ | | } | } /* All bio vec iterated */ | |- if (extent_len) { |- endio_readpage_release_extent(tree, extent_start, extent_len, update); /* extent_start == SZ_1M, extent_len == SZ_8K, uptodate = 1 */ As the above flow shows, the existing code in end_bio_extent_readpage() is just accumulate extent_start/extent_len, and when the contiguous range breaks, call endio_readpage_release_extent() for the range. The contiguous range breaks at two locations: - The total else {} branch This means we had a page in a bio where it's not contiguous. Currently this branch will never be triggered. As all our bio is submitted as contiguous pages. - After the bio_for_each_segment_all() loop ends This is the normal call sites where we iterated all bvecs of a bio, and all pages should be contiguous, thus we can call endio_readpage_release_extent() on the full range. The original code has also considered cases like (!uptodate), so it will mark the uptodate range with EXTENT_UPTODATE. So this patch will remove the extent_start/extent_len dancing, replace it with regular endio_readpage_release_extent() call on each bvec. This brings one behavior change: - Temporary memory usage increase Unlike the old call which only modify the extent tree once, now we update the extent tree for each bvec. Although the end result is the same, since we may need more extent state split/allocation, we need more temporary memory during that bvec iteration. But considering how streamline the new code is, the temporary memory usage increase should be acceptable. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 33 +++------------------------------ 1 file changed, 3 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f3515d3c1321..58dc55e1429d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2779,12 +2779,10 @@ static void end_bio_extent_writepage(struct bio *bio) bio_put(bio); } -static void -endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len, - int uptodate) +static void endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, + u64 end, int uptodate) { struct extent_state *cached = NULL; - u64 end = start + len - 1; if (uptodate && tree->track_uptodate) set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC); @@ -2812,8 +2810,6 @@ static void end_bio_extent_readpage(struct bio *bio) u64 start; u64 end; u64 len; - u64 extent_start = 0; - u64 extent_len = 0; int mirror; int ret; struct bvec_iter_all iter_all; @@ -2922,32 +2918,9 @@ static void end_bio_extent_readpage(struct bio *bio) unlock_page(page); offset += len; - if (unlikely(!uptodate)) { - if (extent_len) { - endio_readpage_release_extent(tree, - extent_start, - extent_len, 1); - extent_start = 0; - extent_len = 0; - } - endio_readpage_release_extent(tree, start, - end - start + 1, 0); - } else if (!extent_len) { - extent_start = start; - extent_len = end + 1 - start; - } else if (extent_start + extent_len == start) { - extent_len += end + 1 - start; - } else { - endio_readpage_release_extent(tree, extent_start, - extent_len, uptodate); - extent_start = start; - extent_len = end + 1 - start; - } + endio_readpage_release_extent(tree, start, end, uptodate); } - if (extent_len) - endio_readpage_release_extent(tree, extent_start, extent_len, - uptodate); btrfs_io_bio_free_csum(io_bio); bio_put(bio); } From patchwork Tue Nov 3 13:30:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF56AC388F7 for ; Tue, 3 Nov 2020 13:31:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 620AD20786 for ; Tue, 3 Nov 2020 13:31:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="pDIeWgzk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729220AbgKCNbY (ORCPT ); Tue, 3 Nov 2020 08:31:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:44138 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729043AbgKCNbX (ORCPT ); Tue, 3 Nov 2020 08:31:23 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uWq7WaOSbqQfefQsr88YRjfzAKnV2r3c7D1xyOuCAGY=; b=pDIeWgzkCZzHN9SPvrf5qAPCsHzilIOY2vhw4aoXKTStgE7BUWMGJeONy1pnEmoKnSI6Ab hA9LfmDcuJRnqQ/WqpC6M8B/PhmhgN/qYASvImSeKUmYLKMlykB6MfGy/jJ7QW+MRcf+1T 5I8ISHrvyzLVZSGCQXXicYEYs0gZUiU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 18790ABF4; Tue, 3 Nov 2020 13:31:22 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 02/32] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Date: Tue, 3 Nov 2020 21:30:38 +0800 Message-Id: <20201103133108.148112-3-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In end_bio_extent_readpage(), we set page uptodate or error according to the bio status. However that assumes all submitted reads are in page size. To support case like subpage read, we should only set the whole page uptodate if all data in the page have been read from disk. This patch will integrate the page status update into endio_readpage_release_extent() for end_bio_extent_readpage(). Now in endio_readpage_release_extent() we will set the page uptodate if: - start/end range covers the full page This is the existing behavior already. - the whole page range is already uptodate This adds the support for subpage read. And for the error path, we always clear the page uptodate and set the page error. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 38 ++++++++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 58dc55e1429d..228bf0c5f7a0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2779,13 +2779,35 @@ static void end_bio_extent_writepage(struct bio *bio) bio_put(bio); } -static void endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, - u64 end, int uptodate) +static void endio_readpage_release_extent(struct extent_io_tree *tree, + struct page *page, u64 start, u64 end, int uptodate) { struct extent_state *cached = NULL; - if (uptodate && tree->track_uptodate) - set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC); + if (uptodate) { + u64 page_start = page_offset(page); + u64 page_end = page_offset(page) + PAGE_SIZE - 1; + + if (tree->track_uptodate) { + /* + * The tree has EXTENT_UPTODATE bit tracking, update + * extent io tree, and use it to update the page if + * needed. + */ + set_extent_uptodate(tree, start, end, &cached, GFP_NOFS); + check_page_uptodate(tree, page); + } else if (start <= page_start && end >= page_end) { + /* We have covered the full page, set it uptodate */ + SetPageUptodate(page); + } + } else if (!uptodate){ + if (tree->track_uptodate) + clear_extent_uptodate(tree, start, end, &cached); + + /* Any error in the page range would invalid the uptodate bit */ + ClearPageUptodate(page); + SetPageError(page); + } unlock_extent_cached_atomic(tree, start, end, &cached); } @@ -2910,15 +2932,11 @@ static void end_bio_extent_readpage(struct bio *bio) off = offset_in_page(i_size); if (page->index == end_index && off) zero_user_segment(page, off, PAGE_SIZE); - SetPageUptodate(page); - } else { - ClearPageUptodate(page); - SetPageError(page); } - unlock_page(page); offset += len; - endio_readpage_release_extent(tree, start, end, uptodate); + endio_readpage_release_extent(tree, page, start, end, uptodate); + unlock_page(page); } btrfs_io_bio_free_csum(io_bio); From patchwork Tue Nov 3 13:30:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E82DC2D0A3 for ; Tue, 3 Nov 2020 13:31:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A451D20786 for ; Tue, 3 Nov 2020 13:31:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="npoTGx50" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729252AbgKCNb3 (ORCPT ); Tue, 3 Nov 2020 08:31:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:44192 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729043AbgKCNb0 (ORCPT ); Tue, 3 Nov 2020 08:31:26 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBsh8AS/ePYnb31EdT6vVpaN4yojvxK7Yfk/f//2g9g=; b=npoTGx5019ZSKU1MGSdHiswC4tujdpYUyFoH2g2WzAQTPNhZsx8VIPmAkMtHkigBp4nkcl FLuUjNoRDJ+iGSdXvfLNh/jpIDT/uWtwKn1yw3nCunj/pGXyeflHbjeM0CsR1gvsOsfo6g r3VsI+lmJdEMjGhsrD67AadiIbWii7Q= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5FB1FABF4; Tue, 3 Nov 2020 13:31:25 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov , David Sterba Subject: [PATCH 03/32] btrfs: extent_io: add lockdep_assert_held() for attach_extent_buffer_page() Date: Tue, 3 Nov 2020 21:30:39 +0800 Message-Id: <20201103133108.148112-4-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When calling attach_extent_buffer_page(), either we're attaching anonymous pages, called from btrfs_clone_extent_buffer(). Or we're attaching btree_inode pages, called from alloc_extent_buffer(). For the later case, we should have page->mapping->private_lock hold to avoid race modifying page->private. Add lockdep_assert_held() if we're calling from alloc_extent_buffer(). Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 228bf0c5f7a0..9cbce0b74db7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3093,6 +3093,15 @@ static int submit_extent_page(unsigned int opf, static void attach_extent_buffer_page(struct extent_buffer *eb, struct page *page) { + /* + * If the page is mapped to btree inode, we should hold the private + * lock to prevent race. + * For cloned or dummy extent buffers, their pages are not mapped and + * will not race with any other ebs. + */ + if (page->mapping) + lockdep_assert_held(&page->mapping->private_lock); + if (!PagePrivate(page)) attach_page_private(page, eb); else From patchwork Tue Nov 3 13:30:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F137EC388F7 for ; Tue, 3 Nov 2020 13:31:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A344120786 for ; Tue, 3 Nov 2020 13:31:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="nokx8q5a" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729267AbgKCNba (ORCPT ); Tue, 3 Nov 2020 08:31:30 -0500 Received: from mx2.suse.de ([195.135.220.15]:44204 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbgKCNb3 (ORCPT ); Tue, 3 Nov 2020 08:31:29 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410287; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iL/ii7FmWgOH+BAuQhWp0bAfvUzdkiPvm8dMgrzrFeo=; b=nokx8q5aS/0t+aVobezeMBfckIM/J9edI5dgVQo8dZNSSi4UJ37lqgsKxdR0CQGQNf23ug 5MKk1CqOq2u/jywC+JQFX/5OCjIp2/ScLyhCLqDFlROSeDb66n5hfvKq8FLM4gj36G8rzu aC8ioG2fpTvuTWl2v/OIVFr/p62XnOI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AE490ACC0; Tue, 3 Nov 2020 13:31:27 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 04/32] btrfs: extent_io: extract the btree page submission code into its own helper function Date: Tue, 3 Nov 2020 21:30:40 +0800 Message-Id: <20201103133108.148112-5-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In btree_write_cache_pages() we have a btree page submission routine buried deeply into a nested loop. This patch will extract that part of code into a helper function, submit_btree_page(), to do the same work. Also, since submit_btree_page() now can return >0 for successfull extent buffer submission, remove the "ASSERT(ret <= 0);" line. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 116 +++++++++++++++++++++++++------------------ 1 file changed, 69 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9cbce0b74db7..ac396d8937b9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3935,10 +3935,75 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, return ret; } +/* + * A helper to submit a btree page. + * + * This function is not always submitting the page, as we only submit the full + * extent buffer in a batch. + * + * @page: The btree page + * @prev_eb: Previous extent buffer, to determine if we need to submit + * this page. + * + * Return >0 if we have submitted the extent buffer successfully. + * Return 0 if we don't need to do anything for the page. + * Return <0 for fatal error. + */ +static int submit_btree_page(struct page *page, struct writeback_control *wbc, + struct extent_page_data *epd, + struct extent_buffer **prev_eb) +{ + struct address_space *mapping = page->mapping; + struct extent_buffer *eb; + int ret; + + if (!PagePrivate(page)) + return 0; + + spin_lock(&mapping->private_lock); + if (!PagePrivate(page)) { + spin_unlock(&mapping->private_lock); + return 0; + } + + eb = (struct extent_buffer *)page->private; + + /* + * Shouldn't happen and normally this would be a BUG_ON but no sense + * in crashing the users box for something we can survive anyway. + */ + if (WARN_ON(!eb)) { + spin_unlock(&mapping->private_lock); + return 0; + } + + if (eb == *prev_eb) { + spin_unlock(&mapping->private_lock); + return 0; + } + ret = atomic_inc_not_zero(&eb->refs); + spin_unlock(&mapping->private_lock); + if (!ret) + return 0; + + *prev_eb = eb; + + ret = lock_extent_buffer_for_io(eb, epd); + if (ret <= 0) { + free_extent_buffer(eb); + return ret; + } + ret = write_one_eb(eb, wbc, epd); + free_extent_buffer(eb); + if (ret < 0) + return ret; + return 1; +} + int btree_write_cache_pages(struct address_space *mapping, struct writeback_control *wbc) { - struct extent_buffer *eb, *prev_eb = NULL; + struct extent_buffer *prev_eb = NULL; struct extent_page_data epd = { .bio = NULL, .extent_locked = 0, @@ -3984,55 +4049,13 @@ int btree_write_cache_pages(struct address_space *mapping, for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - if (!PagePrivate(page)) - continue; - - spin_lock(&mapping->private_lock); - if (!PagePrivate(page)) { - spin_unlock(&mapping->private_lock); - continue; - } - - eb = (struct extent_buffer *)page->private; - - /* - * Shouldn't happen and normally this would be a BUG_ON - * but no sense in crashing the users box for something - * we can survive anyway. - */ - if (WARN_ON(!eb)) { - spin_unlock(&mapping->private_lock); - continue; - } - - if (eb == prev_eb) { - spin_unlock(&mapping->private_lock); - continue; - } - - ret = atomic_inc_not_zero(&eb->refs); - spin_unlock(&mapping->private_lock); - if (!ret) - continue; - - prev_eb = eb; - ret = lock_extent_buffer_for_io(eb, &epd); - if (!ret) { - free_extent_buffer(eb); + ret = submit_btree_page(page, wbc, &epd, &prev_eb); + if (ret == 0) continue; - } else if (ret < 0) { - done = 1; - free_extent_buffer(eb); - break; - } - - ret = write_one_eb(eb, wbc, &epd); - if (ret) { + if (ret < 0) { done = 1; - free_extent_buffer(eb); break; } - free_extent_buffer(eb); /* * the filesystem may choose to bump up nr_to_write. @@ -4053,7 +4076,6 @@ int btree_write_cache_pages(struct address_space *mapping, index = 0; goto retry; } - ASSERT(ret <= 0); if (ret < 0) { end_write_bio(&epd, ret); return ret; From patchwork Tue Nov 3 13:30:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FF69C2D0A3 for ; Tue, 3 Nov 2020 13:31:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 191DD20786 for ; Tue, 3 Nov 2020 13:31:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="jJqV7KJ8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729278AbgKCNbb (ORCPT ); Tue, 3 Nov 2020 08:31:31 -0500 Received: from mx2.suse.de ([195.135.220.15]:44232 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbgKCNba (ORCPT ); Tue, 3 Nov 2020 08:31:30 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410289; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gpFE8oGKjNhFNSUbDpxsQVFqw9g+A6pSux2uUhMC+HE=; b=jJqV7KJ8R8FTC1ngfLGKwc7PYLw9xbngGfDnwhr9MyU3plw5hwvVLlZwpdXErAaS5phOsE qEHxOucss3EkSzVKKvgYNjStITBzpqJbKmQY/k06fekBTloWiJsQ7cdhIrJpRPYXJt3j8n oYK3/J4+kOgwYolGkhagRD67iGrilsg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9F16DABF4 for ; Tue, 3 Nov 2020 13:31:29 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/32] btrfs: extent-io-tests: remove invalid tests Date: Tue, 3 Nov 2020 21:30:41 +0800 Message-Id: <20201103133108.148112-6-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In extent-io-test, there are two invalid tests: - Invalid nodesize for test_eb_bitmaps() Instead of the sectorsize and nodesize combination passed in, we're always using hand-crafted nodesize, e.g: len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE) ? sectorsize * 4 : sectorsize; In above case, if we have 32K page size, then we will get a length of 128K, which is beyond max node size, and obviously invalid. Thankfully most machines are either 4K or 64K page size, thus we haven't yet hit such case. - Invalid extent buffer bytenr For 64K page size, the only combination we're going to test is sectorsize = nodesize = 64K. However in that case, we will try to test an eb which bytenr is not sectorsize aligned: /* Do it over again with an extent buffer which isn't page-aligned. */ eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len); Sector alignedment is a hard requirement for any sector size. The only exception is superblock. But anything else should follow sector size alignment. This is definitely an invalid test case. This patch will fix both problems by: - Honor the sectorsize/nodesize combination Now we won't bother to hand-craft a strange length and use it as nodesize. - Use sectorsize as the 2nd run extent buffer start This would test the case where extent buffer is aligned to sectorsize but not always aligned to nodesize. Please note that, later subpage related cleanup will reduce extent_buffer::pages[] to exact what we need, making the sector unaligned extent buffer operations to cause problem. Since only extent_io self tests utilize this invalid feature, this patch is required for all later cleanup/refactors. Signed-off-by: Qu Wenruo --- fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index df7ce874a74b..73e96d505f4f 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb, static int test_eb_bitmaps(u32 sectorsize, u32 nodesize) { struct btrfs_fs_info *fs_info; - unsigned long len; unsigned long *bitmap = NULL; struct extent_buffer *eb = NULL; int ret; test_msg("running extent buffer bitmap tests"); - /* - * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than - * BTRFS_MAX_METADATA_BLOCKSIZE. - */ - len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE) - ? sectorsize * 4 : sectorsize; - - fs_info = btrfs_alloc_dummy_fs_info(len, len); + fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize); if (!fs_info) { test_std_err(TEST_ALLOC_FS_INFO); return -ENOMEM; } - bitmap = kmalloc(len, GFP_KERNEL); + bitmap = kmalloc(nodesize, GFP_KERNEL); if (!bitmap) { test_err("couldn't allocate test bitmap"); ret = -ENOMEM; goto out; } - eb = __alloc_dummy_extent_buffer(fs_info, 0, len); + eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize); if (!eb) { test_std_err(TEST_ALLOC_ROOT); ret = -ENOMEM; goto out; } - ret = __test_eb_bitmaps(bitmap, eb, len); + ret = __test_eb_bitmaps(bitmap, eb, nodesize); if (ret) goto out; - /* Do it over again with an extent buffer which isn't page-aligned. */ free_extent_buffer(eb); - eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len); + + /* + * Test again for case where the tree block is sectorsize aligned but + * not nodesize aligned. + */ + eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize); if (!eb) { test_std_err(TEST_ALLOC_ROOT); ret = -ENOMEM; goto out; } - ret = __test_eb_bitmaps(bitmap, eb, len); + ret = __test_eb_bitmaps(bitmap, eb, nodesize); out: free_extent_buffer(eb); kfree(bitmap); From patchwork Tue Nov 3 13:30:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 988F4C388F7 for ; Tue, 3 Nov 2020 13:31:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F1DF2080C for ; Tue, 3 Nov 2020 13:31:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="lDnVavnF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729288AbgKCNbe (ORCPT ); Tue, 3 Nov 2020 08:31:34 -0500 Received: from mx2.suse.de ([195.135.220.15]:44344 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729279AbgKCNbe (ORCPT ); Tue, 3 Nov 2020 08:31:34 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=06di7rCzkk0epVAhGb/Pa2gzvBm7s4XDPultbMkFD/E=; b=lDnVavnFvUiDGeYH1GYLL5QT4PzEeffJ3XJyHrcbIQMSAQ7RWBWYB84/KfVhRLhgFdLzqy VKijSLoUvforkFhsifzZpQsAFrwyCvmFfB+DCBTif9DO5SBloHfMiychxcEUEVZehBWtH1 yQ8UYUkrwsrO5ORcNyd5RP3DlDTsBrY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EC952ACC0; Tue, 3 Nov 2020 13:31:32 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 06/32] btrfs: extent_io: calculate inline extent buffer page size based on page size Date: Tue, 3 Nov 2020 21:30:42 +0800 Message-Id: <20201103133108.148112-7-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs only support 64K as max node size, thus for 4K page system, we would have at most 16 pages for one extent buffer. For a system using 64K page size, we would really have just one single page. While we always use 16 pages for extent_buffer::pages[], this means for systems using 64K pages, we are wasting memory for the 15 pages which will never be utilized. So this patch will change how the extent_buffer::pages[] array size is calclulated, now it will be calculated using BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE. For systems using 4K page size, it will stay 16 pages. For systems using 64K page size, it will be just 1 page. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Reviewed-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 6 +++--- fs/btrfs/extent_io.h | 8 +++++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ac396d8937b9..092d9f69abb2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4990,9 +4990,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, /* * Sanity checks, currently the maximum is 64k covered by 16x 4k pages */ - BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE - > MAX_INLINE_EXTENT_BUFFER_SIZE); - BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE); + BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE > + INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE); + BUG_ON(len > BTRFS_MAX_METADATA_BLOCKSIZE); return eb; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 5403354de0e1..123c3947be49 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -73,9 +73,11 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio, typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode, struct bio *bio, u64 bio_offset); - -#define INLINE_EXTENT_BUFFER_PAGES 16 -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE) +/* + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle + * including "ctree.h". + */ +#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE) struct extent_buffer { u64 start; unsigned long len; From patchwork Tue Nov 3 13:30:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A05E0C388F9 for ; Tue, 3 Nov 2020 13:31:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5121820786 for ; Tue, 3 Nov 2020 13:31:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="S93NKY5P" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729350AbgKCNbj (ORCPT ); Tue, 3 Nov 2020 08:31:39 -0500 Received: from mx2.suse.de ([195.135.220.15]:44416 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbgKCNbi (ORCPT ); Tue, 3 Nov 2020 08:31:38 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iuB4NnPbCNxEJKoK73LE81+FPvuj8MX14VguYDWoZjA=; b=S93NKY5PBCcADiF36VQw6k2iNHJY0++ibNyEmGO3ntE/HluAi4Zf7I0X+0G8bSz/zl2Bfy HCvWsDi6ZtDNWR0t96fBo2JaHQM3GSuXfVGrQLOFWxxXBAD+I0rGMbbmQ6/42UEHXLK6i8 ltxruHOr2vZyR2kjN6sl9UWHMJrUHJM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B851AAB0E; Tue, 3 Nov 2020 13:31:36 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov , David Sterba Subject: [PATCH 07/32] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Date: Tue, 3 Nov 2020 21:30:43 +0800 Message-Id: <20201103133108.148112-8-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage sized sector size support, one page can contain mutliple tree blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or we can easily get extent buffer doesn't belongs to the bytenr. This patch will use (extent_buffer::start >> sectorsize_bits) as index for radix tree so that we can get correct extent buffer for subpage size support. While still keep the behavior same for regular sector size. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 092d9f69abb2..a90cdcf01b7f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5121,7 +5121,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info, rcu_read_lock(); eb = radix_tree_lookup(&fs_info->buffer_radix, - start >> PAGE_SHIFT); + start >> fs_info->sectorsize_bits); if (eb && atomic_inc_not_zero(&eb->refs)) { rcu_read_unlock(); /* @@ -5173,7 +5173,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info, } spin_lock(&fs_info->buffer_lock); ret = radix_tree_insert(&fs_info->buffer_radix, - start >> PAGE_SHIFT, eb); + start >> eb->fs_info->sectorsize_bits, eb); spin_unlock(&fs_info->buffer_lock); radix_tree_preload_end(); if (ret == -EEXIST) { @@ -5281,7 +5281,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, spin_lock(&fs_info->buffer_lock); ret = radix_tree_insert(&fs_info->buffer_radix, - start >> PAGE_SHIFT, eb); + start >> fs_info->sectorsize_bits, eb); spin_unlock(&fs_info->buffer_lock); radix_tree_preload_end(); if (ret == -EEXIST) { @@ -5337,7 +5337,7 @@ static int release_extent_buffer(struct extent_buffer *eb) spin_lock(&fs_info->buffer_lock); radix_tree_delete(&fs_info->buffer_radix, - eb->start >> PAGE_SHIFT); + eb->start >> fs_info->sectorsize_bits); spin_unlock(&fs_info->buffer_lock); } else { spin_unlock(&eb->refs_lock); From patchwork Tue Nov 3 13:30:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23B8DC2D0A3 for ; Tue, 3 Nov 2020 13:31:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CC79A20786 for ; Tue, 3 Nov 2020 13:31:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="DulBNW+y" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729294AbgKCNbo (ORCPT ); Tue, 3 Nov 2020 08:31:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:44576 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgKCNbn (ORCPT ); Tue, 3 Nov 2020 08:31:43 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CBx9ZdCSJC1BXIHvn/bEr70EaR3xMTnMC4SlDW90lTI=; b=DulBNW+yfXfrCEVYC3mqrmmW7gh1Ve0W2jgF2wVIK+q/6gjRuR3WzUZCNdgeA+1Z4sA8De 5aTp5VwHVuNjv09Yf311dRHbHR/x7zS29RGcMEr2YFKcFl2Eg97hx9uu7uQUyNihFHNhZW 8o2Lph+u4xm5E4REjpnqeHPYY9eMj3Y= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C177DAB0E for ; Tue, 3 Nov 2020 13:31:41 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/32] btrfs: extent_io: sink less common parameters for __set_extent_bit() Date: Tue, 3 Nov 2020 21:30:44 +0800 Message-Id: <20201103133108.148112-9-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For __set_extent_bit(), those parameter are less common for most callers: - exclusive_bits - failed_start Mostly for extent locking. - extent_changeset For qgroup usage. As a common design principle, less common parameters should have their default values and only callers really need them will set the parameters to non-default values. Sink those parameters into a new structure, extent_io_extra_options. So most callers won't bother those less used parameters, and make later expansion easier. Signed-off-by: Qu Wenruo --- fs/btrfs/extent-io-tree.h | 22 ++++++++++++++ fs/btrfs/extent_io.c | 61 ++++++++++++++++++++++++--------------- 2 files changed, 59 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index cab4273ff8d3..c93065794567 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -82,6 +82,28 @@ struct extent_state { #endif }; +/* + * Extra options for extent io tree operations. + * + * All of these options are initialized to 0/false/NULL by default, + * and most callers should utilize the wrappers other than the extra options. + */ +struct extent_io_extra_options { + /* + * For __set_extent_bit(), to return -EEXIST when hit an extent with + * @excl_bits set, and update @excl_failed_start. + * Utizlied by EXTENT_LOCKED wrappers. + */ + u32 excl_bits; + u64 excl_failed_start; + + /* + * For __set/__clear_extent_bit() to record how many bytes is modified. + * For qgroup related functions. + */ + struct extent_changeset *changeset; +}; + int __init extent_state_cache_init(void); void __cold extent_state_cache_exit(void); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a90cdcf01b7f..1fd92815553d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -29,6 +29,7 @@ static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; static struct bio_set btrfs_bioset; +static struct extent_io_extra_options default_opts = { 0 }; static inline bool extent_state_in_tree(const struct extent_state *state) { return !RB_EMPTY_NODE(&state->rb_node); @@ -952,10 +953,10 @@ static void cache_state(struct extent_state *state, } /* - * set some bits on a range in the tree. This may require allocations or + * Set some bits on a range in the tree. This may require allocations or * sleeping, so the gfp mask is used to indicate what is allowed. * - * If any of the exclusive bits are set, this will fail with -EEXIST if some + * If *any* of the exclusive bits are set, this will fail with -EEXIST if some * part of the range already has the desired bits set. The start of the * existing range is returned in failed_start in this case. * @@ -964,26 +965,30 @@ static void cache_state(struct extent_state *state, static int __must_check __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, unsigned exclusive_bits, - u64 *failed_start, struct extent_state **cached_state, - gfp_t mask, struct extent_changeset *changeset) + unsigned bits, struct extent_state **cached_state, + gfp_t mask, struct extent_io_extra_options *extra_opts) { struct extent_state *state; struct extent_state *prealloc = NULL; struct rb_node *node; struct rb_node **p; struct rb_node *parent; + struct extent_changeset *changeset; int err = 0; + u32 exclusive_bits; + u64 *failed_start; u64 last_start; u64 last_end; btrfs_debug_check_extent_io_range(tree, start, end); trace_btrfs_set_extent_bit(tree, start, end - start + 1, bits); - if (exclusive_bits) - ASSERT(failed_start); - else - ASSERT(failed_start == NULL); + if (!extra_opts) + extra_opts = &default_opts; + exclusive_bits = extra_opts->excl_bits; + failed_start = &extra_opts->excl_failed_start; + changeset = extra_opts->changeset; + again: if (!prealloc && gfpflags_allow_blocking(mask)) { /* @@ -1186,7 +1191,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, struct extent_state **cached_state, gfp_t mask) { - return __set_extent_bit(tree, start, end, bits, 0, NULL, cached_state, + return __set_extent_bit(tree, start, end, bits, cached_state, mask, NULL); } @@ -1413,6 +1418,10 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, struct extent_changeset *changeset) { + struct extent_io_extra_options extra_opts = { + .changeset = changeset, + }; + /* * We don't support EXTENT_LOCKED yet, as current changeset will * record any bits changed, so for EXTENT_LOCKED case, it will @@ -1421,15 +1430,14 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, */ BUG_ON(bits & EXTENT_LOCKED); - return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL, GFP_NOFS, - changeset); + return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS, + &extra_opts); } int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits) { - return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL, - GFP_NOWAIT, NULL); + return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT, NULL); } int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, @@ -1460,16 +1468,18 @@ int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached_state) { + struct extent_io_extra_options extra_opts = { + .excl_bits = EXTENT_LOCKED, + }; int err; - u64 failed_start; while (1) { err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, - EXTENT_LOCKED, &failed_start, - cached_state, GFP_NOFS, NULL); + cached_state, GFP_NOFS, &extra_opts); if (err == -EEXIST) { - wait_extent_bit(tree, failed_start, end, EXTENT_LOCKED); - start = failed_start; + wait_extent_bit(tree, extra_opts.excl_failed_start, end, + EXTENT_LOCKED); + start = extra_opts.excl_failed_start; } else break; WARN_ON(start > end); @@ -1479,14 +1489,17 @@ int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end) { + struct extent_io_extra_options extra_opts = { + .excl_bits = EXTENT_LOCKED, + }; int err; - u64 failed_start; - err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, EXTENT_LOCKED, - &failed_start, NULL, GFP_NOFS, NULL); + err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, + NULL, GFP_NOFS, &extra_opts); if (err == -EEXIST) { - if (failed_start > start) - clear_extent_bit(tree, start, failed_start - 1, + if (extra_opts.excl_failed_start > start) + clear_extent_bit(tree, start, + extra_opts.excl_failed_start - 1, EXTENT_LOCKED, 1, 0, NULL); return 0; } From patchwork Tue Nov 3 13:30:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA823C388F9 for ; Tue, 3 Nov 2020 13:31:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 867EB20786 for ; Tue, 3 Nov 2020 13:31:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="alVkLHY4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729354AbgKCNbr (ORCPT ); Tue, 3 Nov 2020 08:31:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:44648 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgKCNbr (ORCPT ); Tue, 3 Nov 2020 08:31:47 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410305; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nj76KXWRIDgrDfVECzfT4sVyBtulhuIYpAYKfCfpWTw=; b=alVkLHY428NdeAMdRssb9p3e5GK6Pm6eBuFxz8hnEfOia0S4OJJsz9q/oK87ERK3ryCACB 2GxRmPjS84ZguxFzPaaDZO9IxCKKnrkDQqLxVmVeGcmXQTYiVYfxoL/Ni5KWMX8qisCWZR lnbLRLBFt2MiP8TccMmG6CXnpTMdIWQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 75353ABF4 for ; Tue, 3 Nov 2020 13:31:45 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/32] btrfs: extent_io: sink less common parameters for __clear_extent_bit() Date: Tue, 3 Nov 2020 21:30:45 +0800 Message-Id: <20201103133108.148112-10-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The following parameters are less commonly used for __clear_extent_bit(): - wake To wake up the waiters - delete For cleanup cases, to remove the extent state regardless of its state - changeset Only utilized for qgroup Sink them into extent_io_extra_options structure. For most callers who don't care these options, we obviously sink some parameters, without any impact. For callers who care these options, we slightly increase the stack usage, as the extent_io_extra options has extra members only for __set_extent_bits(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent-io-tree.h | 30 +++++++++++++++++++------- fs/btrfs/extent_io.c | 45 ++++++++++++++++++++++++++++----------- fs/btrfs/extent_map.c | 2 +- 3 files changed, 56 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index c93065794567..b5dab64d5f85 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -102,6 +102,15 @@ struct extent_io_extra_options { * For qgroup related functions. */ struct extent_changeset *changeset; + + /* + * For __clear_extent_bit(). + * @wake: Wake up the waiters. Mostly for EXTENT_LOCKED case + * @delete: Delete the extent regardless of its state. Mostly for + * cleanup. + */ + bool wake; + bool delete; }; int __init extent_state_cache_init(void); @@ -139,9 +148,8 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, int wake, int delete, struct extent_state **cached); int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int wake, int delete, - struct extent_state **cached, gfp_t mask, - struct extent_changeset *changeset); + unsigned bits, struct extent_state **cached_state, + gfp_t mask, struct extent_io_extra_options *extra_opts); static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end) { @@ -151,15 +159,21 @@ static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end) static inline int unlock_extent_cached(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached) { - return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, cached, - GFP_NOFS, NULL); + struct extent_io_extra_options extra_opts = { + .wake = true, + }; + return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached, + GFP_NOFS, &extra_opts); } static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached) { - return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, cached, - GFP_ATOMIC, NULL); + struct extent_io_extra_options extra_opts = { + .wake = true, + }; + return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached, + GFP_ATOMIC, &extra_opts); } static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start, @@ -189,7 +203,7 @@ static inline int set_extent_bits(struct extent_io_tree *tree, u64 start, static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached_state) { - return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE, 0, 0, + return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE, cached_state, GFP_NOFS, NULL); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1fd92815553d..614759ad02b3 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -688,26 +688,38 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err) * or inserting elements in the tree, so the gfp mask is used to * indicate which allocations or sleeping are allowed. * - * pass 'wake' == 1 to kick any sleepers, and 'delete' == 1 to remove - * the given range from the tree regardless of state (ie for truncate). + * extar_opts::wake: To kick any sleeps. + * extra_opts::delete: To remove the given range regardless of state + * (ie for truncate) + * extra_opts::changeset: To record how many bytes are modified and + * which ranges are modified. (for qgroup) * - * the range [start, end] is inclusive. + * The range [start, end] is inclusive. * - * This takes the tree lock, and returns 0 on success and < 0 on error. + * Returns 0 on success + * No error can be returned yet, the ENOMEM for memory is handled by BUG_ON(). */ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int wake, int delete, - struct extent_state **cached_state, - gfp_t mask, struct extent_changeset *changeset) + unsigned bits, struct extent_state **cached_state, + gfp_t mask, struct extent_io_extra_options *extra_opts) { + struct extent_changeset *changeset; struct extent_state *state; struct extent_state *cached; struct extent_state *prealloc = NULL; struct rb_node *node; + bool wake; + bool delete; u64 last_end; int err; int clear = 0; + if (!extra_opts) + extra_opts = &default_opts; + changeset = extra_opts->changeset; + wake = extra_opts->wake; + delete = extra_opts->delete; + btrfs_debug_check_extent_io_range(tree, start, end); trace_btrfs_clear_extent_bit(tree, start, end - start + 1, bits); @@ -1444,21 +1456,30 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, int wake, int delete, struct extent_state **cached) { - return __clear_extent_bit(tree, start, end, bits, wake, delete, - cached, GFP_NOFS, NULL); + struct extent_io_extra_options extra_opts = { + .wake = wake, + .delete = delete, + }; + + return __clear_extent_bit(tree, start, end, bits, + cached, GFP_NOFS, &extra_opts); } int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, struct extent_changeset *changeset) { + struct extent_io_extra_options extra_opts = { + .changeset = changeset, + }; + /* * Don't support EXTENT_LOCKED case, same reason as * set_record_extent_bits(). */ BUG_ON(bits & EXTENT_LOCKED); - return __clear_extent_bit(tree, start, end, bits, 0, 0, NULL, GFP_NOFS, - changeset); + return __clear_extent_bit(tree, start, end, bits, NULL, GFP_NOFS, + &extra_opts); } /* @@ -4454,7 +4475,7 @@ static int try_release_extent_state(struct extent_io_tree *tree, */ ret = __clear_extent_bit(tree, start, end, ~(EXTENT_LOCKED | EXTENT_NODATASUM), - 0, 0, NULL, mask, NULL); + NULL, mask, NULL); /* if clear_extent_bit failed for enomem reasons, * we can't allow the release to continue. diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index bd6229fb2b6f..95651ddbb3a7 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -380,7 +380,7 @@ static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits) __clear_extent_bit(&device->alloc_state, stripe->physical, stripe->physical + stripe_size - 1, bits, - 0, 0, NULL, GFP_NOWAIT, NULL); + NULL, GFP_NOWAIT, NULL); } } From patchwork Tue Nov 3 13:30:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFC76C388F7 for ; Tue, 3 Nov 2020 13:31:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 493DD20870 for ; Tue, 3 Nov 2020 13:31:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Fn2Lxv8b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729357AbgKCNbu (ORCPT ); Tue, 3 Nov 2020 08:31:50 -0500 Received: from mx2.suse.de ([195.135.220.15]:44690 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgKCNbu (ORCPT ); Tue, 3 Nov 2020 08:31:50 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410309; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=htUvAJAoU9EbXW7IYGFWfOyqeUc8fblvRzLOJ2LrgNk=; b=Fn2Lxv8b/LMTO455c8Gag9lCwxoCMT31EsC6FIgWWsp1CjwbptVMmbzAXxSj2J4HNAq79n nBnuBtyxinE2uWy0uCAmzV/gnhnjRl5UsuHB0zEHGg+XZqHqZ+vN1S1agC4oPfFIMzW9v7 vrY4rdqZaJwneay6lG6ykzaqjXOytw8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 03E71ABF4 for ; Tue, 3 Nov 2020 13:31:48 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/32] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Date: Tue, 3 Nov 2020 21:30:46 +0800 Message-Id: <20201103133108.148112-11-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since commit f28491e0a6c4 ("Btrfs: move the extent buffer radix tree into the fs_info"), fs_info can be grabbed from extent_buffer directly. So use that extent_buffer::fs_info directly in btrfs_mark_buffer_dirty() to make things a little easier. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov Reviewed-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c70a52b44ceb..1b527b2d16d8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4191,8 +4191,7 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid, void btrfs_mark_buffer_dirty(struct extent_buffer *buf) { - struct btrfs_fs_info *fs_info; - struct btrfs_root *root; + struct btrfs_fs_info *fs_info = buf->fs_info; u64 transid = btrfs_header_generation(buf); int was_dirty; @@ -4205,8 +4204,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) if (unlikely(test_bit(EXTENT_BUFFER_UNMAPPED, &buf->bflags))) return; #endif - root = BTRFS_I(buf->pages[0]->mapping->host)->root; - fs_info = root->fs_info; btrfs_assert_tree_locked(buf); if (transid != fs_info->generation) WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n", From patchwork Tue Nov 3 13:30:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1A17C2D0A3 for ; Tue, 3 Nov 2020 13:31:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B24620786 for ; Tue, 3 Nov 2020 13:31:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="RA5J2nac" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729373AbgKCNb6 (ORCPT ); Tue, 3 Nov 2020 08:31:58 -0500 Received: from mx2.suse.de ([195.135.220.15]:44700 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729355AbgKCNbw (ORCPT ); Tue, 3 Nov 2020 08:31:52 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iVfCxhdW+91afVmj5+J1Nffb4dwJMeSGJIR4/4aitvA=; b=RA5J2nac2m6j/daZluo32c0GJxWy2uGTYAwM8u524gLjCjRqcCQ6fkrLkLerzPcIlvCATi Kyg6Uy1NWYLMJfnsA7FWczWUSDZw7YVeDqv99jmntt6jvce/BQeTtr2UiH/lT0oKJ4Hhbx OX3ngxGw9wj2JxGd4luUvy54qRkOQ0s= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BFE51ABF4; Tue, 3 Nov 2020 13:31:50 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Goldwyn Rodrigues , Nikolay Borisov Subject: [PATCH 11/32] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Date: Tue, 3 Nov 2020 21:30:47 +0800 Message-Id: <20201103133108.148112-12-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage size support, we only need to handle the first page. To make the code work for both cases, we modify the following behaviors: - num_pages calcuation Instead of "nodesize >> PAGE_SHIFT", we go "DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one page for subpage size support, while still get the same result for regular page size. - The length for the first run Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize) - BTRFS_CSUM_SIZE. This allows us to handle both cases well. - The start location of the first run Instead of always use BTRFS_CSUM_SIZE as csum start position, add offset_in_page(eb->start) to get proper offset for both cases. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1b527b2d16d8..9a72cb5ef31e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -211,16 +211,16 @@ void btrfs_set_buffer_lockdep_class(u64 objectid, struct extent_buffer *eb, static void csum_tree_block(struct extent_buffer *buf, u8 *result) { struct btrfs_fs_info *fs_info = buf->fs_info; - const int num_pages = fs_info->nodesize >> PAGE_SHIFT; + const int num_pages = DIV_ROUND_UP(fs_info->nodesize, PAGE_SIZE); SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); char *kaddr; int i; shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); - kaddr = page_address(buf->pages[0]); + kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start); crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE, - PAGE_SIZE - BTRFS_CSUM_SIZE); + min_t(u32, PAGE_SIZE, fs_info->nodesize) - BTRFS_CSUM_SIZE); for (i = 1; i < num_pages; i++) { kaddr = page_address(buf->pages[i]); From patchwork Tue Nov 3 13:30:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4E13C55178 for ; Tue, 3 Nov 2020 13:31:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7ECE020786 for ; Tue, 3 Nov 2020 13:31:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="VRJFYisM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729265AbgKCNbz (ORCPT ); Tue, 3 Nov 2020 08:31:55 -0500 Received: from mx2.suse.de ([195.135.220.15]:44748 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgKCNby (ORCPT ); Tue, 3 Nov 2020 08:31:54 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E55vpWvLD/cUX0Q3r76Dq0AMCYLDovDQ2TcxfVHuIAY=; b=VRJFYisM8jZS1qOtZFCjOyesk3B5l5+y7I5eVe5xxgWLu3CGIjj+ixZ1Irbn2IWTuCGiBo sXdg9SWQZZEj2XwkprBOQx2GnZd8cDeeC/Uqg0jVLZjdRyaSwWU3IqGCWjxlEugKw+aJwu G4ZB045dV9odYJCuqR5o4LL4yPfyNNM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 90C72AFA7 for ; Tue, 3 Nov 2020 13:31:52 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/32] btrfs: disk-io: extract the extent buffer verification from btrfs_validate_metadata_buffer() Date: Tue, 3 Nov 2020 21:30:48 +0800 Message-Id: <20201103133108.148112-13-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_validate_metadata_buffer() only needs to handle one extent buffer as currently one page only maps to one extent buffer. But for incoming subpage support, one page can be mapped to multiple extent buffers, thus we can no longer use current code. This refactor would allow us to call validate_extent_buffer() on all involved extent buffers at btrfs_validate_metadata_buffer() and other locations. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9a72cb5ef31e..de9132564f10 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -524,60 +524,35 @@ static int check_tree_block_fsid(struct extent_buffer *eb) return 1; } -int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset, - struct page *page, u64 start, u64 end, - int mirror) +/* Do basic extent buffer check at read time */ +static int validate_extent_buffer(struct extent_buffer *eb) { + struct btrfs_fs_info *fs_info = eb->fs_info; u64 found_start; - int found_level; - struct extent_buffer *eb; - struct btrfs_fs_info *fs_info; - u16 csum_size; - int ret = 0; + u32 csum_size = fs_info->csum_size; + u8 found_level; u8 result[BTRFS_CSUM_SIZE]; - int reads_done; - - if (!page->private) - goto out; - - eb = (struct extent_buffer *)page->private; - fs_info = eb->fs_info; - csum_size = fs_info->csum_size; - - /* the pending IO might have been the only thing that kept this buffer - * in memory. Make sure we have a ref for all this other checks - */ - atomic_inc(&eb->refs); - - reads_done = atomic_dec_and_test(&eb->io_pages); - if (!reads_done) - goto err; - - eb->read_mirror = mirror; - if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) { - ret = -EIO; - goto err; - } + int ret = 0; found_start = btrfs_header_bytenr(eb); if (found_start != eb->start) { btrfs_err_rl(fs_info, "bad tree block start, want %llu have %llu", eb->start, found_start); ret = -EIO; - goto err; + goto out; } if (check_tree_block_fsid(eb)) { btrfs_err_rl(fs_info, "bad fsid on block %llu", eb->start); ret = -EIO; - goto err; + goto out; } found_level = btrfs_header_level(eb); if (found_level >= BTRFS_MAX_LEVEL) { btrfs_err(fs_info, "bad tree block level %d on %llu", (int)btrfs_header_level(eb), eb->start); ret = -EIO; - goto err; + goto out; } btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), @@ -596,7 +571,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset, CSUM_FMT_VALUE(csum_size, result), btrfs_header_level(eb)); ret = -EUCLEAN; - goto err; + goto out; } /* @@ -618,6 +593,39 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset, btrfs_err(fs_info, "block=%llu read time tree block corruption detected", eb->start); +out: + return ret; +} + +int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset, + struct page *page, u64 start, u64 end, + int mirror) +{ + struct extent_buffer *eb; + int ret = 0; + int reads_done; + + if (!page->private) + goto out; + + eb = (struct extent_buffer *)page->private; + + /* + * The pending IO might have been the only thing that kept this buffer + * in memory. Make sure we have a ref for all this other checks + */ + atomic_inc(&eb->refs); + + reads_done = atomic_dec_and_test(&eb->io_pages); + if (!reads_done) + goto err; + + eb->read_mirror = mirror; + if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) { + ret = -EIO; + goto err; + } + ret = validate_extent_buffer(eb); err: if (reads_done && test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags)) From patchwork Tue Nov 3 13:30:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F77CC5517A for ; Tue, 3 Nov 2020 13:31:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E63220786 for ; Tue, 3 Nov 2020 13:31:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="cZCykw3J" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729371AbgKCNb5 (ORCPT ); Tue, 3 Nov 2020 08:31:57 -0500 Received: from mx2.suse.de ([195.135.220.15]:44784 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgKCNb4 (ORCPT ); Tue, 3 Nov 2020 08:31:56 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YC171oiQDQFyI5d1NRMI8ZQTGYda/w0ebv6AUusJSyE=; b=cZCykw3JvZsBe7re3wtkDpTGEYn0XVtpdDOQ3Dkex1ENeJC/jCtsUjq625H1M/0UCth1BJ SXSB9IYvbwBepFVwWrR5XHCgHDAwYRBTE49Frj8Rn+ZakPbOReLNpUrMq7jx6XAefvEJAx PB+9UG5EdhuYxaQRV7dJce7xrGd3KZg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AAC9FABF4 for ; Tue, 3 Nov 2020 13:31:55 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 13/32] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Date: Tue, 3 Nov 2020 21:30:49 +0800 Message-Id: <20201103133108.148112-14-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently csum_dirty_buffer() uses page to grab extent buffer, but that only works for regular sector size == PAGE_SIZE case. For subpage we need page + page_offset to grab extent buffer. This patch will change csum_dirty_buffer() to accept bvec directly so that we can extract both page and page_offset for later subpage support. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index de9132564f10..3259a5b32caf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -449,8 +449,9 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb, * we only fill in the checksum field in the first page of a multi-page block */ -static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) +static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec) { + struct page *page = bvec->bv_page; u64 start = page_offset(page); u64 found_start; u8 result[BTRFS_CSUM_SIZE]; @@ -794,7 +795,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio) ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { root = BTRFS_I(bvec->bv_page->mapping->host)->root; - ret = csum_dirty_buffer(root->fs_info, bvec->bv_page); + ret = csum_dirty_buffer(root->fs_info, bvec); if (ret) break; } From patchwork Tue Nov 3 13:30:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBD48C2D0A3 for ; Tue, 3 Nov 2020 13:32:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89B3120786 for ; Tue, 3 Nov 2020 13:32:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="WSZOrNjO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729376AbgKCNcA (ORCPT ); Tue, 3 Nov 2020 08:32:00 -0500 Received: from mx2.suse.de ([195.135.220.15]:44832 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729355AbgKCNb7 (ORCPT ); Tue, 3 Nov 2020 08:31:59 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lZ2lwAe8nNrxrkFCmTtjimcAOM0pzjMfD/ZzzyG/psg=; b=WSZOrNjOUrXoX6yrzEvrDkynjVA9XCfyiiuxfzTQwbXdfC8lNJRUUuJpTY/7HxIosnwRW3 JbW97u9POKpALIJuXJOIY7GhyCneOwljPXuIlqgYKJnyT7F4+P/x/Kht+mwRkGqCX8onc7 m6eOc/zK73BtNSBXPsMmcC5NMvTo8Qc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1C9FEABF4; Tue, 3 Nov 2020 13:31:58 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Goldwyn Rodrigues Subject: [PATCH 14/32] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Date: Tue, 3 Nov 2020 21:30:50 +0800 Message-Id: <20201103133108.148112-15-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_readpage_end_io_hook() just pass the whole page to check_data_csum(), which is fine since we only support sectorsize == PAGE_SIZE. To support subpage, we need to properly honor per-sector checksum verification, just like what we did in dio read path. This patch will do the csum verification in a for loop, starts with pg_off == start - page_offset(page), with sectorsize increasement for each loop. For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we will only finish with just one loop. For subpage case, we do the loop to iterate each sector and if we found any error, we return error. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c54e0ed0b938..0432ca58eade 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2888,9 +2888,11 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, struct page *page, u64 start, u64 end, int mirror) { size_t offset = start - page_offset(page); + size_t pg_off; struct inode *inode = page->mapping->host; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_root *root = BTRFS_I(inode)->root; + u32 sectorsize = root->fs_info->sectorsize; if (PageChecked(page)) { ClearPageChecked(page); @@ -2910,7 +2912,15 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, } phy_offset >>= root->fs_info->sectorsize_bits; - return check_data_csum(inode, io_bio, phy_offset, page, offset); + for (pg_off = offset; pg_off < end - page_offset(page); + pg_off += sectorsize, phy_offset++) { + int ret; + + ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off); + if (ret < 0) + return -EIO; + } + return 0; } /* From patchwork Tue Nov 3 13:30:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1539C55178 for ; Tue, 3 Nov 2020 13:32:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B90120786 for ; Tue, 3 Nov 2020 13:32:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="uJNSZ4Wy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729374AbgKCNcC (ORCPT ); Tue, 3 Nov 2020 08:32:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:44844 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729355AbgKCNcB (ORCPT ); Tue, 3 Nov 2020 08:32:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F0ITjYJVIsXk474dvJ/zwRTyBsaOnN9Y7QIdxp/Ydt0=; b=uJNSZ4Wy2NGyf2tjNcokr5S3f7sLOYuoJbIUMO7iPECYM13ssmntjQ/9TYZ8OwQQAWC6YF eD2l59enByej8m+IHUUlRIp/njhIDDvRleskQ786aA9qeRWjBQ816dvDTnFCWAJjvXG2OG ndKtmO7kTzSJfcYfBuPsLMH2GwNV8Kc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5C89DAF95 for ; Tue, 3 Nov 2020 13:32:00 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 15/32] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Date: Tue, 3 Nov 2020 21:30:51 +0800 Message-Id: <20201103133108.148112-16-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just to save us several letters for the incoming patches. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b46eecf882a1..a08cf6545a82 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3607,6 +3607,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) return signal_pending(current); } +static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info) +{ + return (fs_info->sectorsize < PAGE_SIZE); +} + #define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) /* Sanity test specific functions */ From patchwork Tue Nov 3 13:30:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4922BC2D0A3 for ; Tue, 3 Nov 2020 13:32:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF52420786 for ; Tue, 3 Nov 2020 13:32:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="uFF0i0GB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729384AbgKCNcG (ORCPT ); Tue, 3 Nov 2020 08:32:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:44918 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729355AbgKCNcE (ORCPT ); Tue, 3 Nov 2020 08:32:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EKiRCVMNFR9U61javOv/fMb4uhRPeER7LTDCMTusNr4=; b=uFF0i0GBzbXSdYMVOGiJR2trxJHDAss7HJstEA8FhBHCqcbErRwOFtiyZxM1VfCKoclyNd 2QreRbSOT1kSyp5X5duV9jcEl2tE2cN4xJR5g7rDLGGL8matOcEoFwiCkD25GRTmqlLAGb IWUpgV+mLvycVdmxjhvcFNJgOVUPPI4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 729E5ABF4 for ; Tue, 3 Nov 2020 13:32:02 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 16/32] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Date: Tue, 3 Nov 2020 21:30:52 +0800 Message-Id: <20201103133108.148112-17-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently if we pass mutliple @bits to find_first_extent_bit(), it will return the first range with one or more bits matching @bits. This is fine for current code, since most of them are just doing their own extra checks, and all existing callers only call it with 1 or 2 bits. But for the incoming subpage support, we want the ability to return range with exact match, so that caller can skip some extra checks. So this patch will add a new bool parameter, @exact_match, to find_first_extent_bit() and its callees. Currently all callers just pass 'false' to the new parameter, thus no functional change is introduced. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 2 +- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/extent-io-tree.h | 2 +- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/extent_io.c | 42 +++++++++++++++++++++++++------------ fs/btrfs/free-space-cache.c | 2 +- fs/btrfs/relocation.c | 2 +- fs/btrfs/transaction.c | 4 ++-- fs/btrfs/volumes.c | 2 +- 10 files changed, 40 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bb6685711824..19d84766568c 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -461,7 +461,7 @@ u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end ret = find_first_extent_bit(&info->excluded_extents, start, &extent_start, &extent_end, EXTENT_DIRTY | EXTENT_UPTODATE, - NULL); + false, NULL); if (ret) break; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 5b9e3f3ace22..c102a704ead2 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -612,7 +612,7 @@ static int btrfs_set_target_alloc_state(struct btrfs_device *srcdev, while (!find_first_extent_bit(&srcdev->alloc_state, start, &found_start, &found_end, - CHUNK_ALLOCATED, &cached_state)) { + CHUNK_ALLOCATED, false, &cached_state)) { ret = set_extent_bits(&tgtdev->alloc_state, found_start, found_end, CHUNK_ALLOCATED); if (ret) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3259a5b32caf..7a847513708d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4515,7 +4515,7 @@ static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, while (1) { ret = find_first_extent_bit(dirty_pages, start, &start, &end, - mark, NULL); + mark, false, NULL); if (ret) break; @@ -4555,7 +4555,7 @@ static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info, */ mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = find_first_extent_bit(unpin, 0, &start, &end, - EXTENT_DIRTY, &cached_state); + EXTENT_DIRTY, false, &cached_state); if (ret) { mutex_unlock(&fs_info->unused_bg_unpin_mutex); break; diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index b5dab64d5f85..516e76c806d7 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -257,7 +257,7 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start, int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned bits, - struct extent_state **cached_state); + bool exact_match, struct extent_state **cached_state); void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned bits); int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a27caa47aa62..06630bd7ae04 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2877,7 +2877,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = find_first_extent_bit(unpin, 0, &start, &end, - EXTENT_DIRTY, &cached_state); + EXTENT_DIRTY, false, &cached_state); if (ret) { mutex_unlock(&fs_info->unused_bg_unpin_mutex); break; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 614759ad02b3..30768e49cf47 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1558,13 +1558,27 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end) } } -/* find the first state struct with 'bits' set after 'start', and - * return it. tree->lock must be held. NULL will returned if - * nothing was found after 'start' +static bool match_extent_state(struct extent_state *state, unsigned bits, + bool exact_match) +{ + if (exact_match) + return ((state->state & bits) == bits); + return (state->state & bits); +} + +/* + * Find the first state struct with @bits set after @start. + * + * NOTE: tree->lock must be hold. + * + * @exact_match: Do we need to have all @bits set, or just any of + * the @bits. + * + * Return NULL if we can't find a match. */ static struct extent_state * find_first_extent_bit_state(struct extent_io_tree *tree, - u64 start, unsigned bits) + u64 start, unsigned bits, bool exact_match) { struct rb_node *node; struct extent_state *state; @@ -1579,7 +1593,8 @@ find_first_extent_bit_state(struct extent_io_tree *tree, while (1) { state = rb_entry(node, struct extent_state, rb_node); - if (state->end >= start && (state->state & bits)) + if (state->end >= start && + match_extent_state(state, bits, exact_match)) return state; node = rb_next(node); @@ -1600,7 +1615,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree, */ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned bits, - struct extent_state **cached_state) + bool exact_match, struct extent_state **cached_state) { struct extent_state *state; int ret = 1; @@ -1610,7 +1625,8 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, state = *cached_state; if (state->end == start - 1 && extent_state_in_tree(state)) { while ((state = next_state(state)) != NULL) { - if (state->state & bits) + if (match_extent_state(state, bits, + exact_match)) goto got_it; } free_extent_state(*cached_state); @@ -1621,7 +1637,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, *cached_state = NULL; } - state = find_first_extent_bit_state(tree, start, bits); + state = find_first_extent_bit_state(tree, start, bits, exact_match); got_it: if (state) { cache_state_if_flags(state, cached_state, 0); @@ -1656,7 +1672,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, int ret = 1; spin_lock(&tree->lock); - state = find_first_extent_bit_state(tree, start, bits); + state = find_first_extent_bit_state(tree, start, bits, false); if (state) { *start_ret = state->start; *end_ret = state->end; @@ -2432,9 +2448,8 @@ int clean_io_failure(struct btrfs_fs_info *fs_info, goto out; spin_lock(&io_tree->lock); - state = find_first_extent_bit_state(io_tree, - failrec->start, - EXTENT_LOCKED); + state = find_first_extent_bit_state(io_tree, failrec->start, + EXTENT_LOCKED, false); spin_unlock(&io_tree->lock); if (state && state->start <= failrec->start && @@ -2470,7 +2485,8 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end) return; spin_lock(&failure_tree->lock); - state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY); + state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY, + false); while (state) { if (state->start > end) break; diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 5ea36a06e514..2fcc685ac8eb 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1090,7 +1090,7 @@ static noinline_for_stack int write_pinned_extent_entries( while (start < block_group->start + block_group->length) { ret = find_first_extent_bit(unpin, start, &extent_start, &extent_end, - EXTENT_DIRTY, NULL); + EXTENT_DIRTY, false, NULL); if (ret) return 0; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 3d4618a01ef3..206e9c8dc269 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3158,7 +3158,7 @@ int find_next_extent(struct reloc_control *rc, struct btrfs_path *path, ret = find_first_extent_bit(&rc->processed_blocks, key.objectid, &start, &end, - EXTENT_DIRTY, NULL); + EXTENT_DIRTY, false, NULL); if (ret == 0 && start <= key.objectid) { btrfs_release_path(path); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8f70d7135497..3894be14bf57 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -976,7 +976,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers); while (!find_first_extent_bit(dirty_pages, start, &start, &end, - mark, &cached_state)) { + mark, false, &cached_state)) { bool wait_writeback = false; err = convert_extent_bit(dirty_pages, start, end, @@ -1031,7 +1031,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info, u64 end; while (!find_first_extent_bit(dirty_pages, start, &start, &end, - EXTENT_NEED_WAIT, &cached_state)) { + EXTENT_NEED_WAIT, false, &cached_state)) { /* * Ignore -ENOMEM errors returned by clear_extent_bit(). * When committing the transaction, we'll remove any entries diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index eb9ee7c2998f..a4ee38a47b1f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1391,7 +1391,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, if (!find_first_extent_bit(&device->alloc_state, *start, &physical_start, &physical_end, - CHUNK_ALLOCATED, NULL)) { + CHUNK_ALLOCATED, false, NULL)) { if (in_range(physical_start, *start, len) || in_range(*start, physical_start, From patchwork Tue Nov 3 13:30:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84922C388F9 for ; Tue, 3 Nov 2020 13:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D37C20786 for ; Tue, 3 Nov 2020 13:32:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="LcNy6ozA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729259AbgKCNcI (ORCPT ); Tue, 3 Nov 2020 08:32:08 -0500 Received: from mx2.suse.de ([195.135.220.15]:44928 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729379AbgKCNcF (ORCPT ); Tue, 3 Nov 2020 08:32:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410324; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4HW27xl7bSd1rK7DTNw+O3TLbSc5iwCo3WqwAwm6x0w=; b=LcNy6ozAY4/xtC27YhzvMA0DRpjCImHze3rTLqkgjy1NrKiWm9JSfOVaHw6L3PDrpw0jRH ClK80ozY1VSjtlvUiFHlBBs/HAACbZ9EyevOC4nWm47RaK0TlPN1pNHTm7q7Q2BLFHQe+P 3d2gDso1lREoax1lWNZJtLKlyTK8oik= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 51F1FAF95 for ; Tue, 3 Nov 2020 13:32:04 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 17/32] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Date: Tue, 3 Nov 2020 21:30:53 +0800 Message-Id: <20201103133108.148112-18-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As a preparation for subpage sector size support (allowing filesystem with sector size smaller than page size to be mounted) if the sector size is smaller than page size, we don't allow tree block to be read if it crosses 64K(*) boundary. The 64K is selected because: - We are only going to support 64K page size for subpage for now - 64K is also the max node size btrfs supports This ensures that, tree blocks are always contained in one page for a system with 64K page size, which can greatly simplify the handling. Or we need to do complex multi-page handling for tree blocks. Currently the only way to create such tree blocks crossing 64K boundary is by btrfs-convert, which will get fixed soon and doesn't get wide-spread usage. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 30768e49cf47..30bbaeaa129a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5261,6 +5261,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, btrfs_err(fs_info, "bad tree block start %llu", start); return ERR_PTR(-EINVAL); } + if (btrfs_is_subpage(fs_info) && round_down(start, PAGE_SIZE) != + round_down(start + len - 1, PAGE_SIZE)) { + btrfs_err(fs_info, + "tree block crosses page boundary, start %llu nodesize %lu", + start, len); + return ERR_PTR(-EINVAL); + } eb = find_extent_buffer(fs_info, start); if (eb) From patchwork Tue Nov 3 13:30:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B417CC55178 for ; Tue, 3 Nov 2020 13:32:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54F2420786 for ; Tue, 3 Nov 2020 13:32:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="XRwoGpp3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729381AbgKCNcJ (ORCPT ); Tue, 3 Nov 2020 08:32:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:45012 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729313AbgKCNcH (ORCPT ); Tue, 3 Nov 2020 08:32:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410326; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vwQuGuj1U64B+G/fFLU0un5TBjZDvgox1uEiCh9kmqA=; b=XRwoGpp3eSnSJw9Ps80Ess5Z71rRy4eY/Irk/9XI/4zSAStCnLFpb2IYZNTAbBZQW8rcmh 5OWygOW7qM5w9V5CqDBtdt0DvIauaV/tCsD6rRju/WdnpM04S5s+bM5qKlBzCc11CjhPkH WWwselQyAgTDj7K9THSSEUiQOtB86Eg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A661DAF95 for ; Tue, 3 Nov 2020 13:32:06 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 18/32] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Date: Tue, 3 Nov 2020 21:30:54 +0800 Message-Id: <20201103133108.148112-19-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage sized extent buffer, we have ensured no extent buffer will cross page boundary, thus we would only need one page for any extent buffer. This patch will update the function num_extent_pages() to handle such case. Now num_extent_pages() would return 1 instead of for subpage sized extent buffer. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/extent_io.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 123c3947be49..24131478289d 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -203,8 +203,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb); static inline int num_extent_pages(const struct extent_buffer *eb) { - return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) - - (eb->start >> PAGE_SHIFT); + /* + * For sectorsize == PAGE_SIZE case, since eb is always aligned to + * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT. + * + * For sectorsize < PAGE_SIZE case, we only want to support 64K + * PAGE_SIZE, and ensured all tree blocks won't cross page boundary. + * So in that case we always got 1 page. + */ + return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT); } static inline int extent_buffer_uptodate(const struct extent_buffer *eb) From patchwork Tue Nov 3 13:30:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C0FDC2D0A3 for ; Tue, 3 Nov 2020 13:32:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1425820870 for ; Tue, 3 Nov 2020 13:32:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="p2O6zH6o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729387AbgKCNcO (ORCPT ); Tue, 3 Nov 2020 08:32:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:45076 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729313AbgKCNcO (ORCPT ); Tue, 3 Nov 2020 08:32:14 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V4QSYiWwDG7TZkUSQ175kU026gp/jfrstv8A4IQYgls=; b=p2O6zH6oCw/3wkerO8CdRzVNpR0SYjy8P7qKL2DSAfqK9+0F29tLcBEybhf2/wO04Vo7Yh aS3n8V7mwjueNgkDNc0RyvAvmnZd+qra8p3J3sxfE2ScNgfAYzoLRfKfb4q9hiuUff6ZTr aPF8uyIbXUYh/6nu9Jhd/TRn8JBEwxg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 51621AF95; Tue, 3 Nov 2020 13:32:11 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Goldwyn Rodrigues Subject: [PATCH 19/32] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Date: Tue, 3 Nov 2020 21:30:55 +0800 Message-Id: <20201103133108.148112-20-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To support sectorsize < PAGE_SIZE case, we need to take extra care for extent buffer accessors. Since sectorsize is smaller than PAGE_SIZE, one page can contain multiple tree blocks, we must use eb->start to determine the real offset to read/write for extent buffer accessors. This patch introduces two helpers to do these: - get_eb_page_index() This is to calculate the index to access extent_buffer::pages. It's just a simple wrapper around "start >> PAGE_SHIFT". For sectorsize == PAGE_SIZE case, nothing is changed. For sectorsize < PAGE_SIZE case, we always get index as 0, and the existing page shift works also fine. - get_eb_page_offset() This is to calculate the offset to access extent_buffer::pages. This needs to take extent_buffer::start into consideration. For sectorsize == PAGE_SIZE case, extent_buffer::start is always aligned to PAGE_SIZE, thus adding extent_buffer::start to offset_in_page() won't change the result. For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives us the correct offset to access. This patch will touch the following parts to cover all extent buffer accessors: - BTRFS_SETGET_HEADER_FUNCS() - read_extent_buffer() - read_extent_buffer_to_user() - memcmp_extent_buffer() - write_extent_buffer_chunk_tree_uuid() - write_extent_buffer_fsid() - write_extent_buffer() - memzero_extent_buffer() - copy_extent_buffer_full() - copy_extent_buffer() - memcpy_extent_buffer() - memmove_extent_buffer() - btrfs_get_token_##bits() - btrfs_get_##bits() - btrfs_set_token_##bits() - btrfs_set_##bits() - generic_bin_search() Signed-off-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.c | 5 ++-- fs/btrfs/ctree.h | 38 ++++++++++++++++++++++-- fs/btrfs/extent_io.c | 66 ++++++++++++++++++++++++----------------- fs/btrfs/struct-funcs.c | 18 ++++++----- 4 files changed, 88 insertions(+), 39 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 113da62dc17f..664a24728162 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1723,10 +1723,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb, oip = offset_in_page(offset); if (oip + key_size <= PAGE_SIZE) { - const unsigned long idx = offset >> PAGE_SHIFT; + const unsigned long idx = get_eb_page_index(offset); char *kaddr = page_address(eb->pages[idx]); - tmp = (struct btrfs_disk_key *)(kaddr + oip); + tmp = (struct btrfs_disk_key *)(kaddr + + get_eb_page_offset(eb, offset)); } else { read_extent_buffer(eb, &unaligned, offset, key_size); tmp = &unaligned; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a08cf6545a82..10226f250274 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1494,13 +1494,14 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\ #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \ static inline u##bits btrfs_##name(const struct extent_buffer *eb) \ { \ - const type *p = page_address(eb->pages[0]); \ + const type *p = page_address(eb->pages[0]) + \ + offset_in_page(eb->start); \ return get_unaligned_le##bits(&p->member); \ } \ static inline void btrfs_set_##name(const struct extent_buffer *eb, \ u##bits val) \ { \ - type *p = page_address(eb->pages[0]); \ + type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \ put_unaligned_le##bits(val, &p->member); \ } @@ -3314,6 +3315,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { } #define ASSERT(expr) (void)(expr) #endif +/* + * Get the correct offset inside the page of extent buffer. + * + * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases. + * + * @eb: The target extent buffer + * @start: The offset inside the extent buffer + */ +static inline size_t get_eb_page_offset(const struct extent_buffer *eb, + unsigned long offset_in_eb) +{ + /* + * For sectorsize == PAGE_SIZE case, eb->start will always be aligned + * to PAGE_SIZE, thus adding it won't cause any difference. + * + * For sectorsize < PAGE_SIZE, we must only read the data belongs to + * the eb, thus we have to take the eb->start into consideration. + */ + return offset_in_page(offset_in_eb + eb->start); +} + +static inline unsigned long get_eb_page_index(unsigned long offset_in_eb) +{ + /* + * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough. + * + * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE, + * and has ensured all tree blocks are contained in one page, thus + * we always get index == 0. + */ + return offset_in_eb >> PAGE_SHIFT; +} + /* * Use that for functions that are conditionally exported for sanity tests but * otherwise static diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 30bbaeaa129a..c7adcd99451a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5695,12 +5695,12 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv, struct page *page; char *kaddr; char *dst = (char *)dstv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_page_offset(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5725,13 +5725,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb, struct page *page; char *kaddr; char __user *dst = (char __user *)dstv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); int ret = 0; WARN_ON(start > eb->len); WARN_ON(start + len > eb->start + eb->len); - offset = offset_in_page(start); + offset = get_eb_page_offset(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5760,13 +5760,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv, struct page *page; char *kaddr; char *ptr = (char *)ptrv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); int ret = 0; if (check_eb_range(eb, start, len)) return -EINVAL; - offset = offset_in_page(start); + offset = get_eb_page_offset(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5792,7 +5792,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb, char *kaddr; WARN_ON(!PageUptodate(eb->pages[0])); - kaddr = page_address(eb->pages[0]); + kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0); memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv, BTRFS_FSID_SIZE); } @@ -5802,7 +5802,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv) char *kaddr; WARN_ON(!PageUptodate(eb->pages[0])); - kaddr = page_address(eb->pages[0]); + kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0); memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv, BTRFS_FSID_SIZE); } @@ -5815,12 +5815,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, struct page *page; char *kaddr; char *src = (char *)srcv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_page_offset(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5844,12 +5844,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start, size_t offset; struct page *page; char *kaddr; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_page_offset(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5873,10 +5873,22 @@ void copy_extent_buffer_full(const struct extent_buffer *dst, ASSERT(dst->len == src->len); - num_pages = num_extent_pages(dst); - for (i = 0; i < num_pages; i++) - copy_page(page_address(dst->pages[i]), - page_address(src->pages[i])); + if (dst->fs_info->sectorsize == PAGE_SIZE) { + num_pages = num_extent_pages(dst); + for (i = 0; i < num_pages; i++) + copy_page(page_address(dst->pages[i]), + page_address(src->pages[i])); + } else { + unsigned long src_index = get_eb_page_index(0); + unsigned long dst_index = get_eb_page_index(0); + size_t src_offset = get_eb_page_offset(src, 0); + size_t dst_offset = get_eb_page_offset(dst, 0); + + ASSERT(src_index == 0 && dst_index == 0); + memcpy(page_address(dst->pages[dst_index]) + dst_offset, + page_address(src->pages[src_index]) + src_offset, + src->len); + } } void copy_extent_buffer(const struct extent_buffer *dst, @@ -5889,7 +5901,7 @@ void copy_extent_buffer(const struct extent_buffer *dst, size_t offset; struct page *page; char *kaddr; - unsigned long i = dst_offset >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(dst_offset); if (check_eb_range(dst, dst_offset, len) || check_eb_range(src, src_offset, len)) @@ -5897,7 +5909,7 @@ void copy_extent_buffer(const struct extent_buffer *dst, WARN_ON(src->len != dst_len); - offset = offset_in_page(dst_offset); + offset = get_eb_page_offset(dst, dst_offset); while (len > 0) { page = dst->pages[i]; @@ -5941,7 +5953,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb, * the bitmap item in the extent buffer + the offset of the byte in the * bitmap item. */ - offset = start + byte_offset; + offset = start + offset_in_page(eb->start) + byte_offset; *page_index = offset >> PAGE_SHIFT; *page_offset = offset_in_page(offset); @@ -6095,11 +6107,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst, return; while (len > 0) { - dst_off_in_page = offset_in_page(dst_offset); - src_off_in_page = offset_in_page(src_offset); + dst_off_in_page = get_eb_page_offset(dst, dst_offset); + src_off_in_page = get_eb_page_offset(dst, src_offset); - dst_i = dst_offset >> PAGE_SHIFT; - src_i = src_offset >> PAGE_SHIFT; + dst_i = get_eb_page_index(dst_offset); + src_i = get_eb_page_index(src_offset); cur = min(len, (unsigned long)(PAGE_SIZE - src_off_in_page)); @@ -6135,11 +6147,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst, return; } while (len > 0) { - dst_i = dst_end >> PAGE_SHIFT; - src_i = src_end >> PAGE_SHIFT; + dst_i = get_eb_page_index(dst_end); + src_i = get_eb_page_index(src_end); - dst_off_in_page = offset_in_page(dst_end); - src_off_in_page = offset_in_page(src_end); + dst_off_in_page = get_eb_page_offset(dst, dst_end); + src_off_in_page = get_eb_page_offset(dst, src_end); cur = min_t(unsigned long, len, src_off_in_page + 1); cur = min(cur, dst_off_in_page + 1); diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c index c46be27be700..8faf93340917 100644 --- a/fs/btrfs/struct-funcs.c +++ b/fs/btrfs/struct-funcs.c @@ -57,8 +57,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token, \ const void *ptr, unsigned long off) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ - const unsigned long oip = offset_in_page(member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ + const unsigned long oip = get_eb_page_offset(token->eb, \ + member_offset); \ const int size = sizeof(u##bits); \ u8 lebytes[sizeof(u##bits)]; \ const int part = PAGE_SIZE - oip; \ @@ -85,8 +86,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \ const void *ptr, unsigned long off) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long oip = offset_in_page(member_offset); \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ + const unsigned long oip = get_eb_page_offset(eb, member_offset);\ + const unsigned long idx = get_eb_page_index(member_offset); \ char *kaddr = page_address(eb->pages[idx]); \ const int size = sizeof(u##bits); \ const int part = PAGE_SIZE - oip; \ @@ -106,8 +107,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token, \ u##bits val) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ - const unsigned long oip = offset_in_page(member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ + const unsigned long oip = get_eb_page_offset(token->eb, \ + member_offset); \ const int size = sizeof(u##bits); \ u8 lebytes[sizeof(u##bits)]; \ const int part = PAGE_SIZE - oip; \ @@ -136,8 +138,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \ unsigned long off, u##bits val) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long oip = offset_in_page(member_offset); \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ + const unsigned long oip = get_eb_page_offset(eb, member_offset);\ + const unsigned long idx = get_eb_page_index(member_offset); \ char *kaddr = page_address(eb->pages[idx]); \ const int size = sizeof(u##bits); \ const int part = PAGE_SIZE - oip; \ From patchwork Tue Nov 3 13:30:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B140CC388F9 for ; Tue, 3 Nov 2020 13:32:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EA7D20870 for ; Tue, 3 Nov 2020 13:32:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="nrnRzhgx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729389AbgKCNcQ (ORCPT ); Tue, 3 Nov 2020 08:32:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:45158 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729313AbgKCNcP (ORCPT ); Tue, 3 Nov 2020 08:32:15 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cdGlFzzmGiRUNyLXqZHNFRYt6BtGGuysE+jzZGWlzLs=; b=nrnRzhgx4yBrBKd8XGwfSEXI5viWBx7DU1jq8T4OqFZkzdmuh2I4gGCX8lj5wsFeMH/sf3 w1m7ca4yFTIbwDd+7y7MQ1tKUAC4DXfZC8GA6ou7gUuNPNqLRifsSOLcYzEWu/A9kKRv81 2W6yRlVPYETTAU5BMHhSRqMudj+NpaY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3BC14ABF4 for ; Tue, 3 Nov 2020 13:32:14 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 20/32] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Date: Tue, 3 Nov 2020 21:30:56 +0800 Message-Id: <20201103133108.148112-21-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In extent_invalidatepage() it will try to clear all possible bits since it's calling clear_extent_bit() with delete == 1. That would try to clear all existing bits. This is currently fine, since for btree io tree, it only utilizes EXTENT_LOCK bit. But this could be a problem for later subpage support, which will utilize extra io tree bit to represent extra info. This patch will just convert that clear_extent_bit() to unlock_extent_cached(). For current code since only EXTENT_LOCKED bit is utilized, this doesn't change the behavior, but provides a much cleaner basis for incoming subpage support. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c7adcd99451a..b770ac039b96 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4459,14 +4459,22 @@ int extent_invalidatepage(struct extent_io_tree *tree, u64 end = start + PAGE_SIZE - 1; size_t blocksize = page->mapping->host->i_sb->s_blocksize; + /* This function is only called for btree */ + ASSERT(tree->owner == IO_TREE_BTREE_INODE_IO); + start += ALIGN(offset, blocksize); if (start > end) return 0; lock_extent_bits(tree, start, end, &cached_state); wait_on_page_writeback(page); - clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING, 1, 1, &cached_state); + + /* + * Currently for btree io tree, only EXTENT_LOCKED is utilized, + * so here we only need to unlock the extent range to free any + * existing extent state. + */ + unlock_extent_cached(tree, start, end, &cached_state); return 0; } From patchwork Tue Nov 3 13:30:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06695C2D0A3 for ; Tue, 3 Nov 2020 13:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C10620870 for ; Tue, 3 Nov 2020 13:32:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="T+fMMxJc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729393AbgKCNcT (ORCPT ); Tue, 3 Nov 2020 08:32:19 -0500 Received: from mx2.suse.de ([195.135.220.15]:45210 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729313AbgKCNcS (ORCPT ); Tue, 3 Nov 2020 08:32:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hd4TywnXcxBhD+uAq2cSj3jYM76qGt6q/E6tKagTQjY=; b=T+fMMxJcpl+2C9oj4fGwdh+dbI1Zmy8W+MHWTTs93HR9DPEmUVmoDKqwYS7YJv5NG2sR89 OQEYsvsMF5YCuQaxurM0XFYjnUahgmSCt/UT0kcDA6/pnHqYuzOyn3kekfdSM2ugFS2Cmm dQmCsQ3XaXeHNW4jdO0XXIsJiE8x7qo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0ED10AFAA for ; Tue, 3 Nov 2020 13:32:16 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 21/32] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Date: Tue, 3 Nov 2020 21:30:57 +0800 Message-Id: <20201103133108.148112-22-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently we use 'unsigned' for extent_state::state, which is only ensured to be at least 16 bits. But for incoming subpage support, we are going to introduce more bits to at least match the following page bits: - PageUptodate - PagePrivate2 Thus we will go beyond 16 bits. To support this, make extent_state::state at least 32bit and to be more explicit, we use "u32" to be clear about the max supported bits. This doesn't increase the memory usage for x86_64, but may affect other architectures. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/extent-io-tree.h | 36 +++++++++++++++------------- fs/btrfs/extent_io.c | 49 +++++++++++++++++++-------------------- fs/btrfs/extent_io.h | 2 +- 3 files changed, 45 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index 516e76c806d7..59c9139f40cc 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -22,6 +22,10 @@ struct io_failure_record; #define EXTENT_QGROUP_RESERVED (1U << 12) #define EXTENT_CLEAR_DATA_RESV (1U << 13) #define EXTENT_DELALLOC_NEW (1U << 14) + +/* For subpage btree io tree, to indicate there is an extent buffer */ +#define EXTENT_HAS_TREE_BLOCK (1U << 15) + #define EXTENT_DO_ACCOUNTING (EXTENT_CLEAR_META_RESV | \ EXTENT_CLEAR_DATA_RESV) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) @@ -73,7 +77,7 @@ struct extent_state { /* ADD NEW ELEMENTS AFTER THIS */ wait_queue_head_t wq; refcount_t refs; - unsigned state; + u32 state; struct io_failure_record *failrec; @@ -136,19 +140,19 @@ void __cold extent_io_exit(void); u64 count_range_bits(struct extent_io_tree *tree, u64 *start, u64 search_end, - u64 max_bytes, unsigned bits, int contig); + u64 max_bytes, u32 bits, int contig); void free_extent_state(struct extent_state *state); int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int filled, + u32 bits, int filled, struct extent_state *cached_state); int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_changeset *changeset); + u32 bits, struct extent_changeset *changeset); int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int wake, int delete, + u32 bits, int wake, int delete, struct extent_state **cached); int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_state **cached_state, + u32 bits, struct extent_state **cached_state, gfp_t mask, struct extent_io_extra_options *extra_opts); static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end) @@ -177,7 +181,7 @@ static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree, } static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start, - u64 end, unsigned bits) + u64 end, u32 bits) { int wake = 0; @@ -188,14 +192,14 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start, } int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_changeset *changeset); + u32 bits, struct extent_changeset *changeset); int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_state **cached_state, gfp_t mask); + u32 bits, struct extent_state **cached_state, gfp_t mask); int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits); + u32 bits); static inline int set_extent_bits(struct extent_io_tree *tree, u64 start, - u64 end, unsigned bits) + u64 end, u32 bits) { return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS); } @@ -222,11 +226,11 @@ static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start, } int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, unsigned clear_bits, + u32 bits, u32 clear_bits, struct extent_state **cached_state); static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start, - u64 end, unsigned int extra_bits, + u64 end, u32 extra_bits, struct extent_state **cached_state) { return set_extent_bit(tree, start, end, @@ -256,12 +260,12 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start, } int find_first_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits, + u64 *start_ret, u64 *end_ret, u32 bits, bool exact_match, struct extent_state **cached_state); void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits); + u64 *start_ret, u64 *end_ret, u32 bits); int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits); + u64 *start_ret, u64 *end_ret, u32 bits); int extent_invalidatepage(struct extent_io_tree *tree, struct page *page, unsigned long offset); bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b770ac039b96..a0c01bea7c54 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -143,7 +143,7 @@ struct extent_page_data { unsigned int sync_io:1; }; -static int add_extent_changeset(struct extent_state *state, unsigned bits, +static int add_extent_changeset(struct extent_state *state, u32 bits, struct extent_changeset *changeset, int set) { @@ -531,7 +531,7 @@ static void merge_state(struct extent_io_tree *tree, } static void set_state_bits(struct extent_io_tree *tree, - struct extent_state *state, unsigned *bits, + struct extent_state *state, u32 *bits, struct extent_changeset *changeset); /* @@ -548,7 +548,7 @@ static int insert_state(struct extent_io_tree *tree, struct extent_state *state, u64 start, u64 end, struct rb_node ***p, struct rb_node **parent, - unsigned *bits, struct extent_changeset *changeset) + u32 *bits, struct extent_changeset *changeset) { struct rb_node *node; @@ -629,11 +629,11 @@ static struct extent_state *next_state(struct extent_state *state) */ static struct extent_state *clear_state_bit(struct extent_io_tree *tree, struct extent_state *state, - unsigned *bits, int wake, + u32 *bits, int wake, struct extent_changeset *changeset) { struct extent_state *next; - unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS; + u32 bits_to_clear = *bits & ~EXTENT_CTLBITS; int ret; if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) { @@ -700,7 +700,7 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err) * No error can be returned yet, the ENOMEM for memory is handled by BUG_ON(). */ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_state **cached_state, + u32 bits, struct extent_state **cached_state, gfp_t mask, struct extent_io_extra_options *extra_opts) { struct extent_changeset *changeset; @@ -881,7 +881,7 @@ static void wait_on_state(struct extent_io_tree *tree, * The tree lock is taken by this function */ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned long bits) + u32 bits) { struct extent_state *state; struct rb_node *node; @@ -928,9 +928,9 @@ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, static void set_state_bits(struct extent_io_tree *tree, struct extent_state *state, - unsigned *bits, struct extent_changeset *changeset) + u32 *bits, struct extent_changeset *changeset) { - unsigned bits_to_set = *bits & ~EXTENT_CTLBITS; + u32 bits_to_set = *bits & ~EXTENT_CTLBITS; int ret; if (tree->private_data && is_data_inode(tree->private_data)) @@ -977,7 +977,7 @@ static void cache_state(struct extent_state *state, static int __must_check __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_state **cached_state, + u32 bits, struct extent_state **cached_state, gfp_t mask, struct extent_io_extra_options *extra_opts) { struct extent_state *state; @@ -1201,7 +1201,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, } int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_state **cached_state, gfp_t mask) + u32 bits, struct extent_state **cached_state, gfp_t mask) { return __set_extent_bit(tree, start, end, bits, cached_state, mask, NULL); @@ -1227,7 +1227,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, * All allocations are done with GFP_NOFS. */ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, unsigned clear_bits, + u32 bits, u32 clear_bits, struct extent_state **cached_state) { struct extent_state *state; @@ -1428,7 +1428,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, /* wrappers around set/clear extent bit */ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_changeset *changeset) + u32 bits, struct extent_changeset *changeset) { struct extent_io_extra_options extra_opts = { .changeset = changeset, @@ -1447,13 +1447,13 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, } int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits) + u32 bits) { return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT, NULL); } int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int wake, int delete, + u32 bits, int wake, int delete, struct extent_state **cached) { struct extent_io_extra_options extra_opts = { @@ -1466,7 +1466,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, } int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, struct extent_changeset *changeset) + u32 bits, struct extent_changeset *changeset) { struct extent_io_extra_options extra_opts = { .changeset = changeset, @@ -1558,7 +1558,7 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end) } } -static bool match_extent_state(struct extent_state *state, unsigned bits, +static bool match_extent_state(struct extent_state *state, u32 bits, bool exact_match) { if (exact_match) @@ -1578,7 +1578,7 @@ static bool match_extent_state(struct extent_state *state, unsigned bits, */ static struct extent_state * find_first_extent_bit_state(struct extent_io_tree *tree, - u64 start, unsigned bits, bool exact_match) + u64 start, u32 bits, bool exact_match) { struct rb_node *node; struct extent_state *state; @@ -1614,7 +1614,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree, * Return 1 if we found nothing. */ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits, + u64 *start_ret, u64 *end_ret, u32 bits, bool exact_match, struct extent_state **cached_state) { struct extent_state *state; @@ -1666,7 +1666,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, * returned will be the full contiguous area with the bits set. */ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits) + u64 *start_ret, u64 *end_ret, u32 bits) { struct extent_state *state; int ret = 1; @@ -1703,7 +1703,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, * trim @end_ret to the appropriate size. */ void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, unsigned bits) + u64 *start_ret, u64 *end_ret, u32 bits) { struct extent_state *state; struct rb_node *node, *prev = NULL, *next; @@ -2074,8 +2074,7 @@ static int __process_pages_contig(struct address_space *mapping, void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct page *locked_page, - unsigned clear_bits, - unsigned long page_ops) + u32 clear_bits, unsigned long page_ops) { clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL); @@ -2091,7 +2090,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, */ u64 count_range_bits(struct extent_io_tree *tree, u64 *start, u64 search_end, u64 max_bytes, - unsigned bits, int contig) + u32 bits, int contig) { struct rb_node *node; struct extent_state *state; @@ -2211,7 +2210,7 @@ struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 sta * range is found set. */ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, - unsigned bits, int filled, struct extent_state *cached) + u32 bits, int filled, struct extent_state *cached) { struct extent_state *state = NULL; struct rb_node *node; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 24131478289d..6b9d7e8c3a31 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -262,7 +262,7 @@ void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end); void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end); void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct page *locked_page, - unsigned bits_to_clear, + u32 bits_to_clear, unsigned long page_ops); struct bio *btrfs_bio_alloc(u64 first_byte); struct bio *btrfs_io_bio_alloc(unsigned int nr_iovecs); From patchwork Tue Nov 3 13:30:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C39A7C388F9 for ; Tue, 3 Nov 2020 13:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B98020870 for ; Tue, 3 Nov 2020 13:32:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="LCT2qFEK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729395AbgKCNcT (ORCPT ); Tue, 3 Nov 2020 08:32:19 -0500 Received: from mx2.suse.de ([195.135.220.15]:45220 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729390AbgKCNcT (ORCPT ); Tue, 3 Nov 2020 08:32:19 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g1ulKSmOKA6d57hB4MS+x8tRPpb2vDmRWp+msi4XznY=; b=LCT2qFEKf1URrBDH1uRP3EJd1HqMgqlARAnaoxWnYOPlhD6oXNYGj3bEvTylxQSWYk5ICz xU8Tzu75JWaBD94RHQznp4VSlAd/BzcbYnMrWUT/Z6bBKOWR4TTVM+dl5h/bnFDXde+VUM ejR/txLhZWc7Gec3Bz003D8e7o5RUQQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 067F9ABF4 for ; Tue, 3 Nov 2020 13:32:18 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 22/32] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums() Date: Tue, 3 Nov 2020 21:30:58 +0800 Message-Id: <20201103133108.148112-23-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In btrfs_lookup_bio_sums() if the bio is pretty large, we want to readahead the csum tree. However the threshold is an immediate number, (PAGE_SIZE * 8), from the initial btrfs merge. The value itself is pretty hard to guess the meaning, especially when the immediate number is from the age where 4K sectorsize is the default and only CRC32 is supported. For the most common btrfs setup, CRC32 csum and 4K sectorsize, it means just 32K read would kick readahead, while the csum itself is only 32 bytes in size. Now let's be more reasonable by taking both csum size and node size into consideration. If the csum size for the bio is larger than one leaf, then we kick the readahead. This means for current default btrfs, the threshold will be 16M. This change should not change performance observably, thus this is mostly a readability enhancement. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/file-item.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 5f3096ea69af..4bf139983282 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -298,7 +298,11 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, csum = dst; } - if (bio->bi_iter.bi_size > PAGE_SIZE * 8) + /* + * If needed number of sectors is larger than one leaf can contain, + * kick the readahead for csum tree would be a good idea. + */ + if (nblocks > fs_info->csums_per_leaf) path->reada = READA_FORWARD; /* From patchwork Tue Nov 3 13:30:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CFFEC388F7 for ; Tue, 3 Nov 2020 13:32:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D5C5B20870 for ; Tue, 3 Nov 2020 13:32:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="VNxADvjp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729397AbgKCNcW (ORCPT ); Tue, 3 Nov 2020 08:32:22 -0500 Received: from mx2.suse.de ([195.135.220.15]:45246 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729312AbgKCNcV (ORCPT ); Tue, 3 Nov 2020 08:32:21 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410339; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sBl1vXFyYdHCzFvyKINDDyJWt05bHKcDbkxWRqOt57Y=; b=VNxADvjpWyPZx5t6ySwKTFQkpKYfkNYprrBXwxFPZlK1xBsdloD3KStAwBh8l7Zyl+Fd+l 650WqYVvxmCg2Sbcj2K09I2JXqZ4TJKBrnVzx/ebxKUheVip2NVjCngMr+kOV9TJBJ8kvf uq0nbEt28ypPnGVc8I0oYmdLLIMhOcE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DBA13ABF4 for ; Tue, 3 Nov 2020 13:32:19 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 23/32] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Date: Tue, 3 Nov 2020 21:30:59 +0800 Message-Id: <20201103133108.148112-24-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The function btrfs_lookup_bio_sums() is only called for read bios. While btrfs_find_ordered_sum() is to search ordered extent sums, which is only for write path. This means the call for btrfs_find_ordered_sum() in fact makes no sense. So this patch will remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums(). And since btrfs_lookup_bio_sums() is the only caller for btrfs_find_ordered_sum(), also remove the implementation. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/file-item.c | 16 ++++++++++----- fs/btrfs/ordered-data.c | 44 ----------------------------------------- fs/btrfs/ordered-data.h | 2 -- 3 files changed, 11 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 4bf139983282..ecb6a1f9945f 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -240,7 +240,8 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, } /** - * btrfs_lookup_bio_sums - Look up checksums for a bio. + * btrfs_lookup_bio_sums - Look up checksums for a read bio. + * * @inode: inode that the bio is for. * @bio: bio to look up. * @offset: Unless (u64)-1, look up checksums for this offset in the file. @@ -275,6 +276,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) return BLK_STS_OK; + /* + * This function is only called for read bio. + * + * This means several things: + * - All of our csums should only be in csum tree + * No ordered extents csums. As ordered extents are only for write + * path. + */ + ASSERT(bio_op(bio) == REQ_OP_READ); path = btrfs_alloc_path(); if (!path) return BLK_STS_RESOURCE; @@ -325,10 +335,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, if (page_offsets) offset = page_offset(bvec.bv_page) + bvec.bv_offset; - count = btrfs_find_ordered_sum(BTRFS_I(inode), offset, - disk_bytenr, csum, nblocks); - if (count) - goto found; if (!item || disk_bytenr < item_start_offset || disk_bytenr >= item_last_offset) { diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 0d61f9fefc02..79d366a36223 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -854,50 +854,6 @@ btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset) return entry; } -/* - * search the ordered extents for one corresponding to 'offset' and - * try to find a checksum. This is used because we allow pages to - * be reclaimed before their checksum is actually put into the btree - */ -int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset, - u64 disk_bytenr, u8 *sum, int len) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_ordered_sum *ordered_sum; - struct btrfs_ordered_extent *ordered; - struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree; - unsigned long num_sectors; - unsigned long i; - const u32 csum_size = fs_info->csum_size; - int index = 0; - - ordered = btrfs_lookup_ordered_extent(inode, offset); - if (!ordered) - return 0; - - spin_lock_irq(&tree->lock); - list_for_each_entry_reverse(ordered_sum, &ordered->list, list) { - if (disk_bytenr >= ordered_sum->bytenr && - disk_bytenr < ordered_sum->bytenr + ordered_sum->len) { - i = (disk_bytenr - ordered_sum->bytenr) >> - fs_info->sectorsize_bits; - num_sectors = ordered_sum->len >> fs_info->sectorsize_bits; - num_sectors = min_t(int, len - index, num_sectors - i); - memcpy(sum + index, ordered_sum->sums + i * csum_size, - num_sectors * csum_size); - - index += (int)num_sectors * csum_size; - if (index == len) - goto out; - disk_bytenr += num_sectors * fs_info->sectorsize; - } - } -out: - spin_unlock_irq(&tree->lock); - btrfs_put_ordered_extent(ordered); - return index; -} - /* * btrfs_flush_ordered_range - Lock the passed range and ensures all pending * ordered extents in it are run to completion. diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 367269effd6a..0bfa82b58e23 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -183,8 +183,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( u64 len); void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode, struct list_head *list); -int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset, - u64 disk_bytenr, u8 *sum, int len); u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr, const u64 range_start, const u64 range_len); void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, From patchwork Tue Nov 3 13:31:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8BB8C2D0A3 for ; Tue, 3 Nov 2020 13:32:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44F1D20870 for ; Tue, 3 Nov 2020 13:32:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="XFw2RBpg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729402AbgKCNc1 (ORCPT ); Tue, 3 Nov 2020 08:32:27 -0500 Received: from mx2.suse.de ([195.135.220.15]:45292 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729312AbgKCNcY (ORCPT ); Tue, 3 Nov 2020 08:32:24 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ILuD6MSx2tZBMk/28SJES+NmEd7eftkRx4O4OLGW/wc=; b=XFw2RBpgfAJ50mJEmS/arF8X12lykJpUN+0tblMKE6OVoLpwdsF6bKcoQaLdx7FuvY2CL4 3zYNTIe8i/hgmXSxinGuUbMv7SEE6TY78QqIMBpezJSX5jrWbV0zHJwF3kJSSqnH/pnH9y ohhJzwKgSCVlocyBLhwTgjHxG3z9S1w= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 655E4ABF4 for ; Tue, 3 Nov 2020 13:32:22 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 24/32] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Date: Tue, 3 Nov 2020 21:31:00 +0800 Message-Id: <20201103133108.148112-25-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Refactor btrfs_lookup_bio_sums() by: - Remove the @file_offset parameter There are two factors making the @file_offset parameter useless: * For csum lookup in csum tree, file offset makes no sense We only need disk_bytenr, which is unrelated to file_offset * page_offset (file offset) of each bvec is not contiguous. Pages can be added to the same bio as long as their on-disk bytenr is contiguous, meaning we could have pages at differnt file offsets in the same bio. Thus passing file_offset makes no sense any more. The only user of file_offset is for data reloc inode, we will use a new function, search_file_offset_in_bio(), to handle it. - Extract the csum tree lookup into find_csum_tree_sums() The new function will handle the csum search in csum tree. The return value is the same as btrfs_find_ordered_sum(), returning the found number of sectors who has checksum. - Change how we do the main loop The only needed info from bio is: * the on-disk bytenr * the length After extracting above info, we can do the search without bio at all, which makes the main loop much simpler: for (cur_disk_bytenr = orig_disk_bytenr; cur_disk_bytenr < orig_disk_bytenr + orig_len; cur_disk_bytenr += count * sectorsize) { /* Lookup csum tree */ count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr, search_len, csum_dst); if (!count) { /* Csum hole handling */ } } - Use single variable as core to calculate all other offsets Instead of all differnt type of variables, we use only one core variable, cur_disk_bytenr, which represents the current disk bytenr. All involves values can be calculated from that core variable, and all those variable will only be visible in the inner loop. diff_sectors = div_u64(cur_disk_bytenr - orig_disk_bytenr, sectorsize); cur_disk_bytenr = orig_disk_bytenr + diff_sectors * sectorsize; csum_dst = csum + diff_sectors * csum_size; All above refactor makes btrfs_lookup_bio_sums() way more robust than it used to, especially related to the file offset lookup. Now file_offset lookup is only related to data reloc inode, other wise we don't need to bother file_offset at all. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 5 +- fs/btrfs/ctree.h | 2 +- fs/btrfs/file-item.c | 236 +++++++++++++++++++++++++++-------------- fs/btrfs/inode.c | 5 +- 4 files changed, 159 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 4e022ed72d2f..3fb6fde2ca13 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -719,8 +719,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, */ refcount_inc(&cb->pending_bios); - ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, - sums); + ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); BUG_ON(ret); /* -ENOMEM */ nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size, @@ -746,7 +745,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ - ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, sums); + ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); BUG_ON(ret); /* -ENOMEM */ ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 10226f250274..b5909eaef231 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2957,7 +2957,7 @@ struct btrfs_dio_private; int btrfs_del_csums(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 len); blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, - u64 offset, u8 *dst); + u8 *dst); int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 pos, diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index ecb6a1f9945f..74bc34488a6d 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -239,13 +239,115 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, return ret; } +/* + * Helper to find csums for logical bytenr range + * [disk_bytenr, disk_bytenr + len) and restore the result to @dst. + * + * Return >0 for the number of sectors we found. + * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum + * for it. Caller may want to try next sector until one range is hit. + * Return <0 for fatal error. + */ +static int search_csum_tree(struct btrfs_fs_info *fs_info, + struct btrfs_path *path, u64 disk_bytenr, + u64 len, u8 *dst) +{ + struct btrfs_csum_item *item = NULL; + struct btrfs_key key; + u32 csum_size = btrfs_super_csum_size(fs_info->super_copy); + u32 sectorsize = fs_info->sectorsize; + int ret; + u64 csum_start; + u64 csum_len; + + ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) && + IS_ALIGNED(len, sectorsize)); + + /* Check if the current csum item covers disk_bytenr */ + if (path->nodes[0]) { + item = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_csum_item); + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + csum_start = key.offset; + csum_len = (btrfs_item_size_nr(path->nodes[0], path->slots[0]) / + csum_size) * sectorsize; + + if (in_range(disk_bytenr, csum_start, csum_len)) + goto found; + } + + /* Current item doesn't contain the desired range, re-search */ + btrfs_release_path(path); + item = btrfs_lookup_csum(NULL, fs_info->csum_root, path, + disk_bytenr, 0); + if (IS_ERR(item)) { + ret = PTR_ERR(item); + goto out; + } + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + csum_start = key.offset; + csum_len = (btrfs_item_size_nr(path->nodes[0], path->slots[0]) / + csum_size) * sectorsize; + ASSERT(in_range(disk_bytenr, csum_start, csum_len)); + +found: + ret = (min(csum_start + csum_len, disk_bytenr + len) - + disk_bytenr) >> fs_info->sectorsize_bits; + read_extent_buffer(path->nodes[0], dst, (unsigned long)item, + ret * csum_size); +out: + if (ret == -ENOENT) + ret = 0; + return ret; +} + +/* + * A helper to locate the file_offset of @cur_disk_bytenr of a @bio. + * + * Bio of btrfs represents read range of + * [bi_sector << 9, bi_sector << 9 + bi_size). + * Knowing this, we can iterate through each bvec to locate the page belong to + * @cur_disk_bytenr and get the file offset. + * + * @inode is used to determine the bvec page really belongs to @inode. + * + * Return 0 if we can't find the file offset; + * Return >0 if we find the file offset and restore it to @file_offset_ret + */ +static int search_file_offset_in_bio(struct bio *bio, struct inode *inode, + u64 disk_bytenr, u64 *file_offset_ret) +{ + struct bvec_iter iter; + struct bio_vec bvec; + u64 cur = bio->bi_iter.bi_sector << 9; + int ret = 0; + + bio_for_each_segment(bvec, bio, iter) { + struct page *page = bvec.bv_page; + + if (cur > disk_bytenr) + break; + if (cur + bvec.bv_len <= disk_bytenr) { + cur += bvec.bv_len; + continue; + } + ASSERT(in_range(disk_bytenr, cur, bvec.bv_len)); + if (page->mapping && page->mapping->host && + page->mapping->host == inode) { + ret = 1; + *file_offset_ret = page_offset(page) + bvec.bv_offset + + disk_bytenr - cur; + break; + } + } + return ret; +} + /** - * btrfs_lookup_bio_sums - Look up checksums for a read bio. + * Lookup the csum for the read bio in csum tree. * * @inode: inode that the bio is for. * @bio: bio to look up. - * @offset: Unless (u64)-1, look up checksums for this offset in the file. - * If (u64)-1, use the page offsets from the bio instead. * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return * checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If * NULL, the checksum buffer is allocated and returned in @@ -254,22 +356,17 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise. */ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, - u64 offset, u8 *dst) + u8 *dst) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct bio_vec bvec; - struct bvec_iter iter; - struct btrfs_csum_item *item = NULL; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_path *path; - const bool page_offsets = (offset == (u64)-1); + u32 sectorsize = fs_info->sectorsize; + u64 orig_len = bio->bi_iter.bi_size; + u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9; + u64 cur_disk_bytenr; u8 *csum; - u64 item_start_offset = 0; - u64 item_last_offset = 0; - u64 disk_bytenr; - u64 page_bytes_left; - u32 diff; - int nblocks; + int nblocks = orig_len >> fs_info->sectorsize_bits; int count = 0; const u32 csum_size = fs_info->csum_size; @@ -283,13 +380,16 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, * - All of our csums should only be in csum tree * No ordered extents csums. As ordered extents are only for write * path. + * - No need to bother any other info from bvec + * Since we're looking up csums, the only important info is the + * disk_bytenr and the length, which can all be extracted from + * bi_iter directly. */ ASSERT(bio_op(bio) == REQ_OP_READ); path = btrfs_alloc_path(); if (!path) return BLK_STS_RESOURCE; - nblocks = bio->bi_iter.bi_size >> fs_info->sectorsize_bits; if (!dst) { struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio); @@ -326,81 +426,53 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, path->skip_locking = 1; } - disk_bytenr = (u64)bio->bi_iter.bi_sector << 9; + for (cur_disk_bytenr = orig_disk_bytenr; + cur_disk_bytenr < orig_disk_bytenr + orig_len; + cur_disk_bytenr += (count << fs_info->sectorsize_bits)) { + int search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr; + int sector_offset; + u8 *csum_dst; - bio_for_each_segment(bvec, bio, iter) { - page_bytes_left = bvec.bv_len; - if (count) - goto next; + sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >> + fs_info->sectorsize_bits; + csum_dst = csum + sector_offset * csum_size; - if (page_offsets) - offset = page_offset(bvec.bv_page) + bvec.bv_offset; + count = search_csum_tree(fs_info, path, cur_disk_bytenr, + search_len, csum_dst); + if (count <= 0) { + /* + * Either we hit a critical error or we didn't find + * the csum. + * Either way, we put zero into the csums dst, and just + * skip to next sector for a better luck. + */ + memset(csum_dst, 0, csum_size); + count = 1; - if (!item || disk_bytenr < item_start_offset || - disk_bytenr >= item_last_offset) { - struct btrfs_key found_key; - u32 item_size; - - if (item) - btrfs_release_path(path); - item = btrfs_lookup_csum(NULL, fs_info->csum_root, - path, disk_bytenr, 0); - if (IS_ERR(item)) { - count = 1; - memset(csum, 0, csum_size); - if (BTRFS_I(inode)->root->root_key.objectid == - BTRFS_DATA_RELOC_TREE_OBJECTID) { - set_extent_bits(io_tree, offset, - offset + fs_info->sectorsize - 1, + /* + * For data reloc inode, we need to mark the + * range NODATASUM so that balance won't report + * false csum error. + */ + if (BTRFS_I(inode)->root->root_key.objectid == + BTRFS_DATA_RELOC_TREE_OBJECTID) { + u64 file_offset; + int ret; + + ret = search_file_offset_in_bio(bio, inode, + cur_disk_bytenr, &file_offset); + if (ret) + set_extent_bits(io_tree, file_offset, + file_offset + sectorsize - 1, EXTENT_NODATASUM); - } else { - btrfs_info_rl(fs_info, - "no csum found for inode %llu start %llu", - btrfs_ino(BTRFS_I(inode)), offset); - } - item = NULL; - btrfs_release_path(path); - goto found; + } else { + btrfs_warn_rl(fs_info, + "csum hole found for disk bytenr range [%llu, %llu)", + cur_disk_bytenr, cur_disk_bytenr + sectorsize); } - btrfs_item_key_to_cpu(path->nodes[0], &found_key, - path->slots[0]); - - item_start_offset = found_key.offset; - item_size = btrfs_item_size_nr(path->nodes[0], - path->slots[0]); - item_last_offset = item_start_offset + - (item_size / csum_size) * - fs_info->sectorsize; - item = btrfs_item_ptr(path->nodes[0], path->slots[0], - struct btrfs_csum_item); - } - /* - * this byte range must be able to fit inside - * a single leaf so it will also fit inside a u32 - */ - diff = disk_bytenr - item_start_offset; - diff = diff >> fs_info->sectorsize_bits; - diff = diff * csum_size; - count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >> - fs_info->sectorsize_bits); - read_extent_buffer(path->nodes[0], csum, - ((unsigned long)item) + diff, - csum_size * count); -found: - csum += count * csum_size; - nblocks -= count; -next: - while (count > 0) { - count--; - disk_bytenr += fs_info->sectorsize; - offset += fs_info->sectorsize; - page_bytes_left -= fs_info->sectorsize; - if (!page_bytes_left) - break; /* move to next bio */ } } - WARN_ON_ONCE(count); btrfs_free_path(path); return BLK_STS_OK; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0432ca58eade..50e80db2aed8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2251,7 +2251,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, * need to csum or not, which is why we ignore skip_sum * here. */ - ret = btrfs_lookup_bio_sums(inode, bio, (u64)-1, NULL); + ret = btrfs_lookup_bio_sums(inode, bio, NULL); if (ret) goto out; } @@ -7859,8 +7859,7 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, * * If we have csums disabled this will do nothing. */ - status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset, - dip->csums); + status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums); if (status != BLK_STS_OK) goto out_err; } From patchwork Tue Nov 3 13:31:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C219AC55179 for ; Tue, 3 Nov 2020 13:32:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E38620870 for ; Tue, 3 Nov 2020 13:32:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Z/+OVKX0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729404AbgKCNc3 (ORCPT ); Tue, 3 Nov 2020 08:32:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:45334 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729401AbgKCNc2 (ORCPT ); Tue, 3 Nov 2020 08:32:28 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bAs9LxMJNqAXDF3QJ3PqQi/OSVqC/sJ8GRgC9JOUHT4=; b=Z/+OVKX0GGuob7G79oI8l+T/CYG8J2i80bktSvnli/XVp+wnOk86WdhCwl0EW9DlF5mVrB lkTLMfWUhERDYWdr4mney6Exw+sen9BD21shtCydErLnvDylVSmUQJQEptJv2X/rGrNChL O1m5twcyEd5Y7qpaS1KHPdM/l01TreM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3CE1AABF4 for ; Tue, 3 Nov 2020 13:32:26 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 25/32] btrfs: scrub: distinguish scrub_page from regular page Date: Tue, 3 Nov 2020 21:31:01 +0800 Message-Id: <20201103133108.148112-26-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several call sites where we declare something like "struct scrub_page *page". This is just asking for troubles when read the code, as we also have scrub_page::page member. To avoid confusion, use "spage" for scrub_page strcture pointers. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 102 +++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 58cd3278fbfe..42d1d5258e83 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -255,10 +255,10 @@ static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_put_ctx(struct scrub_ctx *sctx); -static inline int scrub_is_page_on_raid56(struct scrub_page *page) +static inline int scrub_is_page_on_raid56(struct scrub_page *spage) { - return page->recover && - (page->recover->bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK); + return spage->recover && + (spage->recover->bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK); } static void scrub_pending_bio_inc(struct scrub_ctx *sctx) @@ -1090,11 +1090,11 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) success = 1; for (page_num = 0; page_num < sblock_bad->page_count; page_num++) { - struct scrub_page *page_bad = sblock_bad->pagev[page_num]; + struct scrub_page *spage_bad = sblock_bad->pagev[page_num]; struct scrub_block *sblock_other = NULL; /* skip no-io-error page in scrub */ - if (!page_bad->io_error && !sctx->is_dev_replace) + if (!spage_bad->io_error && !sctx->is_dev_replace) continue; if (scrub_is_page_on_raid56(sblock_bad->pagev[0])) { @@ -1106,7 +1106,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) * sblock_for_recheck array to target device. */ sblock_other = NULL; - } else if (page_bad->io_error) { + } else if (spage_bad->io_error) { /* try to find no-io-error page in mirrors */ for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS && @@ -1145,7 +1145,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sblock_other, page_num, 0); if (0 == ret) - page_bad->io_error = 0; + spage_bad->io_error = 0; else success = 0; } @@ -1323,13 +1323,13 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, for (mirror_index = 0; mirror_index < nmirrors; mirror_index++) { struct scrub_block *sblock; - struct scrub_page *page; + struct scrub_page *spage; sblock = sblocks_for_recheck + mirror_index; sblock->sctx = sctx; - page = kzalloc(sizeof(*page), GFP_NOFS); - if (!page) { + spage = kzalloc(sizeof(*spage), GFP_NOFS); + if (!spage) { leave_nomem: spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -1337,15 +1337,15 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, scrub_put_recover(fs_info, recover); return -ENOMEM; } - scrub_page_get(page); - sblock->pagev[page_index] = page; - page->sblock = sblock; - page->flags = flags; - page->generation = generation; - page->logical = logical; - page->have_csum = have_csum; + scrub_page_get(spage); + sblock->pagev[page_index] = spage; + spage->sblock = sblock; + spage->flags = flags; + spage->generation = generation; + spage->logical = logical; + spage->have_csum = have_csum; if (have_csum) - memcpy(page->csum, + memcpy(spage->csum, original_sblock->pagev[0]->csum, sctx->fs_info->csum_size); @@ -1358,23 +1358,23 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, mirror_index, &stripe_index, &stripe_offset); - page->physical = bbio->stripes[stripe_index].physical + + spage->physical = bbio->stripes[stripe_index].physical + stripe_offset; - page->dev = bbio->stripes[stripe_index].dev; + spage->dev = bbio->stripes[stripe_index].dev; BUG_ON(page_index >= original_sblock->page_count); - page->physical_for_dev_replace = + spage->physical_for_dev_replace = original_sblock->pagev[page_index]-> physical_for_dev_replace; /* for missing devices, dev->bdev is NULL */ - page->mirror_num = mirror_index + 1; + spage->mirror_num = mirror_index + 1; sblock->page_count++; - page->page = alloc_page(GFP_NOFS); - if (!page->page) + spage->page = alloc_page(GFP_NOFS); + if (!spage->page) goto leave_nomem; scrub_get_recover(recover); - page->recover = recover; + spage->recover = recover; } scrub_put_recover(fs_info, recover); length -= sublen; @@ -1392,19 +1392,19 @@ static void scrub_bio_wait_endio(struct bio *bio) static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio, - struct scrub_page *page) + struct scrub_page *spage) { DECLARE_COMPLETION_ONSTACK(done); int ret; int mirror_num; - bio->bi_iter.bi_sector = page->logical >> 9; + bio->bi_iter.bi_sector = spage->logical >> 9; bio->bi_private = &done; bio->bi_end_io = scrub_bio_wait_endio; - mirror_num = page->sblock->pagev[0]->mirror_num; - ret = raid56_parity_recover(fs_info, bio, page->recover->bbio, - page->recover->map_length, + mirror_num = spage->sblock->pagev[0]->mirror_num; + ret = raid56_parity_recover(fs_info, bio, spage->recover->bbio, + spage->recover->map_length, mirror_num, 0); if (ret) return ret; @@ -1429,10 +1429,10 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info, bio_set_dev(bio, first_page->dev->bdev); for (page_num = 0; page_num < sblock->page_count; page_num++) { - struct scrub_page *page = sblock->pagev[page_num]; + struct scrub_page *spage = sblock->pagev[page_num]; - WARN_ON(!page->page); - bio_add_page(bio, page->page, PAGE_SIZE, 0); + WARN_ON(!spage->page); + bio_add_page(bio, spage->page, PAGE_SIZE, 0); } if (scrub_submit_raid56_bio_wait(fs_info, bio, first_page)) { @@ -1473,24 +1473,24 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info, for (page_num = 0; page_num < sblock->page_count; page_num++) { struct bio *bio; - struct scrub_page *page = sblock->pagev[page_num]; + struct scrub_page *spage = sblock->pagev[page_num]; - if (page->dev->bdev == NULL) { - page->io_error = 1; + if (spage->dev->bdev == NULL) { + spage->io_error = 1; sblock->no_io_error_seen = 0; continue; } - WARN_ON(!page->page); + WARN_ON(!spage->page); bio = btrfs_io_bio_alloc(1); - bio_set_dev(bio, page->dev->bdev); + bio_set_dev(bio, spage->dev->bdev); - bio_add_page(bio, page->page, PAGE_SIZE, 0); - bio->bi_iter.bi_sector = page->physical >> 9; + bio_add_page(bio, spage->page, PAGE_SIZE, 0); + bio->bi_iter.bi_sector = spage->physical >> 9; bio->bi_opf = REQ_OP_READ; if (btrfsic_submit_bio_wait(bio)) { - page->io_error = 1; + spage->io_error = 1; sblock->no_io_error_seen = 0; } @@ -1546,36 +1546,36 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad, struct scrub_block *sblock_good, int page_num, int force_write) { - struct scrub_page *page_bad = sblock_bad->pagev[page_num]; - struct scrub_page *page_good = sblock_good->pagev[page_num]; + struct scrub_page *spage_bad = sblock_bad->pagev[page_num]; + struct scrub_page *spage_good = sblock_good->pagev[page_num]; struct btrfs_fs_info *fs_info = sblock_bad->sctx->fs_info; - BUG_ON(page_bad->page == NULL); - BUG_ON(page_good->page == NULL); + BUG_ON(spage_bad->page == NULL); + BUG_ON(spage_good->page == NULL); if (force_write || sblock_bad->header_error || - sblock_bad->checksum_error || page_bad->io_error) { + sblock_bad->checksum_error || spage_bad->io_error) { struct bio *bio; int ret; - if (!page_bad->dev->bdev) { + if (!spage_bad->dev->bdev) { btrfs_warn_rl(fs_info, "scrub_repair_page_from_good_copy(bdev == NULL) is unexpected"); return -EIO; } bio = btrfs_io_bio_alloc(1); - bio_set_dev(bio, page_bad->dev->bdev); - bio->bi_iter.bi_sector = page_bad->physical >> 9; + bio_set_dev(bio, spage_bad->dev->bdev); + bio->bi_iter.bi_sector = spage_bad->physical >> 9; bio->bi_opf = REQ_OP_WRITE; - ret = bio_add_page(bio, page_good->page, PAGE_SIZE, 0); + ret = bio_add_page(bio, spage_good->page, PAGE_SIZE, 0); if (PAGE_SIZE != ret) { bio_put(bio); return -EIO; } if (btrfsic_submit_bio_wait(bio)) { - btrfs_dev_stat_inc_and_print(page_bad->dev, + btrfs_dev_stat_inc_and_print(spage_bad->dev, BTRFS_DEV_STAT_WRITE_ERRS); atomic64_inc(&fs_info->dev_replace.num_write_errors); bio_put(bio); From patchwork Tue Nov 3 13:31:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5930C55178 for ; Tue, 3 Nov 2020 13:32:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89B0120870 for ; Tue, 3 Nov 2020 13:32:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Tc782h4J" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729408AbgKCNca (ORCPT ); Tue, 3 Nov 2020 08:32:30 -0500 Received: from mx2.suse.de ([195.135.220.15]:45444 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729405AbgKCNca (ORCPT ); Tue, 3 Nov 2020 08:32:30 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OIpf6DBdd+mocufU/gfzgTZpllu4tbr6/Nb8M64ZJnE=; b=Tc782h4JWdTYIh7cfzZqbShP87VyWkRMWn8oBkjWxtuMIQu3pYAhLQPCfhCfnSWA70nAW5 vTCM6k71W5Ti5paylQi2HfH292mGSa81/TRE2SLopIJHDqtErYUEPc2AaKjY9ALMmV1jqc QhaZIEfsdMP5bl0o/kltFFvINn3ccXc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id F4106ABF4 for ; Tue, 3 Nov 2020 13:32:28 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 26/32] btrfs: scrub: remove the @force parameter of scrub_pages() Date: Tue, 3 Nov 2020 21:31:02 +0800 Message-Id: <20201103133108.148112-27-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The @force parameter for scrub_pages() is to indicate whether we want to force bio submission. Currently it's only used for super block scrub, and it can be easily determined by the @flags. So remove the parameter to make the parameter a little shorter. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 42d1d5258e83..7e6ed0b79006 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -236,7 +236,7 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx, struct scrub_page *spage); static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 physical, struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num, u8 *csum, int force, + u64 gen, int mirror_num, u8 *csum, u64 physical_for_dev_replace); static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct btrfs_work *work); @@ -2150,12 +2150,16 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock) static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 physical, struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num, u8 *csum, int force, + u64 gen, int mirror_num, u8 *csum, u64 physical_for_dev_replace) { struct scrub_block *sblock; + bool force_submit = false; int index; + if (flags & BTRFS_EXTENT_FLAG_SUPER) + force_submit = true; + sblock = kzalloc(sizeof(*sblock), GFP_KERNEL); if (!sblock) { spin_lock(&sctx->stat_lock); @@ -2229,7 +2233,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, } } - if (force) + if (force_submit) scrub_submit(sctx); } @@ -2441,7 +2445,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, ++sctx->stat.no_csum; } ret = scrub_pages(sctx, logical, l, physical, dev, flags, gen, - mirror_num, have_csum ? csum : NULL, 0, + mirror_num, have_csum ? csum : NULL, physical_for_dev_replace); if (ret) return ret; @@ -3710,7 +3714,7 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr, scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i, - NULL, 1, bytenr); + NULL, bytenr); if (ret) return ret; } From patchwork Tue Nov 3 13:31:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01E81C2D0A3 for ; Tue, 3 Nov 2020 13:32:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A752820870 for ; Tue, 3 Nov 2020 13:32:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="rLNuR96d" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729409AbgKCNcc (ORCPT ); Tue, 3 Nov 2020 08:32:32 -0500 Received: from mx2.suse.de ([195.135.220.15]:45490 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729405AbgKCNcc (ORCPT ); Tue, 3 Nov 2020 08:32:32 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GbHH2JHPNEM+uYbLlCAM8NRsL65dfAtsCgryFGF7fbA=; b=rLNuR96dlIWZUK4GIl69NGwvVqfxGGL2qSCfglxfhJfF48YiprI+1O3e4pe63ZgH3+X+5a 8B8AkfFBsnT733ZTc8P9K1QyPLz8KN021JAlCfiAHkSNAhpaoKH2Ojm4975ycjVxVtS44q YqPmbhwkj/S90uC6KiLETQfvX6vEjew= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 90F5FAFAA for ; Tue, 3 Nov 2020 13:32:30 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 27/32] btrfs: scrub: use flexible array for scrub_page::csums Date: Tue, 3 Nov 2020 21:31:03 +0800 Message-Id: <20201103133108.148112-28-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several factors affecting how many checksum bytes are needed for one scrub_page: - Sector size and page size For subpage case, one page can contain several sectors, thus the csum size will differ. - Checksum size Since btrfs supports different csum size now, which can vary from 4 bytes for CRC32 to 32 bytes for SHA256. So instead of using fixed BTRFS_CSUM_SIZE, now use flexible array for scrub_page::csums, and determine the size at scrub_page allocation time. This does not only provide the basis for later subpage scrub support, but also reduce the memory usage for default btrfs on x86_64. As the default CRC32 only uses 4 bytes, thus we can save 28 bytes for each scrub page. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 41 ++++++++++++++++++++++++++++++----------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 7e6ed0b79006..cabc030d4bf9 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -76,9 +76,14 @@ struct scrub_page { unsigned int have_csum:1; unsigned int io_error:1; }; - u8 csum[BTRFS_CSUM_SIZE]; - struct scrub_recover *recover; + + /* + * The csums size for the page is deteremined by page size, + * sector size and csum size. + * Thus the length has to be determined at runtime. + */ + u8 csums[]; }; struct scrub_bio { @@ -206,6 +211,19 @@ struct full_stripe_lock { struct mutex mutex; }; +static struct scrub_page *alloc_scrub_page(struct scrub_ctx *sctx, gfp_t mask) +{ + u32 sectorsize = sctx->fs_info->sectorsize; + size_t size; + + /* No support for multi-page sector size yet */ + ASSERT(PAGE_SIZE >= sectorsize && IS_ALIGNED(PAGE_SIZE, sectorsize)); + + size = sizeof(struct scrub_page); + size += (PAGE_SIZE / sectorsize) * sctx->fs_info->csum_size; + return kzalloc(size, mask); +} + static void scrub_pending_bio_inc(struct scrub_ctx *sctx); static void scrub_pending_bio_dec(struct scrub_ctx *sctx); static int scrub_handle_errored_block(struct scrub_block *sblock_to_check); @@ -1328,7 +1346,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, sblock = sblocks_for_recheck + mirror_index; sblock->sctx = sctx; - spage = kzalloc(sizeof(*spage), GFP_NOFS); + spage = alloc_scrub_page(sctx, GFP_NOFS); if (!spage) { leave_nomem: spin_lock(&sctx->stat_lock); @@ -1345,8 +1363,8 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, spage->logical = logical; spage->have_csum = have_csum; if (have_csum) - memcpy(spage->csum, - original_sblock->pagev[0]->csum, + memcpy(spage->csums, + original_sblock->pagev[0]->csums, sctx->fs_info->csum_size); scrub_stripe_index_and_offset(logical, @@ -1798,7 +1816,7 @@ static int scrub_checksum_data(struct scrub_block *sblock) crypto_shash_init(shash); crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum); - if (memcmp(csum, spage->csum, sctx->fs_info->csum_size)) + if (memcmp(csum, spage->csums, sctx->fs_info->csum_size)) sblock->checksum_error = 1; return sblock->checksum_error; @@ -2178,7 +2196,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, struct scrub_page *spage; u64 l = min_t(u64, len, PAGE_SIZE); - spage = kzalloc(sizeof(*spage), GFP_KERNEL); + spage = alloc_scrub_page(sctx, GFP_KERNEL); if (!spage) { leave_nomem: spin_lock(&sctx->stat_lock); @@ -2200,7 +2218,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, spage->mirror_num = mirror_num; if (csum) { spage->have_csum = 1; - memcpy(spage->csum, csum, sctx->fs_info->csum_size); + memcpy(spage->csums, csum, sctx->fs_info->csum_size); } else { spage->have_csum = 0; } @@ -2486,7 +2504,9 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, struct scrub_page *spage; u64 l = min_t(u64, len, PAGE_SIZE); - spage = kzalloc(sizeof(*spage), GFP_KERNEL); + BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK); + + spage = alloc_scrub_page(sctx, GFP_KERNEL); if (!spage) { leave_nomem: spin_lock(&sctx->stat_lock); @@ -2495,7 +2515,6 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, scrub_block_put(sblock); return -ENOMEM; } - BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK); /* For scrub block */ scrub_page_get(spage); sblock->pagev[index] = spage; @@ -2511,7 +2530,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, spage->mirror_num = mirror_num; if (csum) { spage->have_csum = 1; - memcpy(spage->csum, csum, sctx->fs_info->csum_size); + memcpy(spage->csums, csum, sctx->fs_info->csum_size); } else { spage->have_csum = 0; } From patchwork Tue Nov 3 13:31:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1558C2D0A3 for ; Tue, 3 Nov 2020 13:32:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5678520870 for ; Tue, 3 Nov 2020 13:32:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="YLwNzcFF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729411AbgKCNch (ORCPT ); Tue, 3 Nov 2020 08:32:37 -0500 Received: from mx2.suse.de ([195.135.220.15]:45694 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729241AbgKCNcg (ORCPT ); Tue, 3 Nov 2020 08:32:36 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410354; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ByGIUyylibMsyIBJzc36esKtRTYItUVstBNjFG4mu7k=; b=YLwNzcFFX11Oani3J2oRscydmOQt8zkcTwohQPYhGGevpvmiJCnVXhLBqhOl5ydcj4rQdf 0TQxdT0C99kUrc5GbqTqVARaFJguZQehkudEWsjrBSZIzFcGoeaphDLIuTlxkgvgcwBU6N 72tl0YmkYNrb9N5dSjn16KWSGeXpQ+0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D48BDAFFD for ; Tue, 3 Nov 2020 13:32:34 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 28/32] btrfs: scrub: refactor scrub_find_csum() Date: Tue, 3 Nov 2020 21:31:04 +0800 Message-Id: <20201103133108.148112-29-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Function scrub_find_csum() is to locate the csum for bytenr @logical from sctx->csum_list. However it lacks a lot of comments to explaining things like how the csum_list is organized and why we need to drop csum range which is before us. Refactor the function by: - Add more comment explaining the behavior - Add comment explaining why we need to drop the csum range - Put the csum copy in the main loop This is mostly for the incoming patches to make scrub_find_csum() able to find multiple checksums. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 71 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index cabc030d4bf9..e4f73dfc3516 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2384,38 +2384,69 @@ static void scrub_block_complete(struct scrub_block *sblock) } } +static void drop_csum_range(struct scrub_ctx *sctx, + struct btrfs_ordered_sum *sum) +{ + u32 sectorsize = sctx->fs_info->sectorsize; + + sctx->stat.csum_discards += sum->len / sectorsize; + list_del(&sum->list); + kfree(sum); +} + +/* + * Find the desired csum for range [@logical, @logical + sectorsize), and + * store the csum into @csum. + * + * The search source is sctx->csum_list, which is a pre-populated list + * storing bytenr ordered csum ranges. + * We're reponsible to cleanup any range that is before @logical. + * + * Return 0 if there is no csum for the range. + * Return 1 if there is csum for the range and copied to @csum. + */ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) { - struct btrfs_ordered_sum *sum = NULL; - unsigned long index; - unsigned long num_sectors; + bool found = false; while (!list_empty(&sctx->csum_list)) { + struct btrfs_ordered_sum *sum = NULL; + unsigned long index; + unsigned long num_sectors; + sum = list_first_entry(&sctx->csum_list, struct btrfs_ordered_sum, list); + /* The current csum range is beyond our range, no csum found */ if (sum->bytenr > logical) - return 0; - if (sum->bytenr + sum->len > logical) break; - ++sctx->stat.csum_discards; - list_del(&sum->list); - kfree(sum); - sum = NULL; - } - if (!sum) - return 0; + /* + * The current sum is before our bytenr, since scrub is + * always done in bytenr order, the csum will never be used + * anymore, clean it up so that later calls won't bother the + * range, and continue search the next range. + */ + if (sum->bytenr + sum->len <= logical) { + drop_csum_range(sctx, sum); + continue; + } - index = (logical - sum->bytenr) >> sctx->fs_info->sectorsize_bits; - ASSERT(index < UINT_MAX); + /* Now the csum range covers our bytenr, copy the csum */ + found = true; + index = (logical - sum->bytenr) >> + sctx->fs_info->sectorsize_bits; + num_sectors = sum->len >> sctx->fs_info->sectorsize_bits; - num_sectors = sum->len >> sctx->fs_info->sectorsize_bits; - memcpy(csum, sum->sums + index * sctx->fs_info->csum_size, - sctx->fs_info->csum_size); - if (index == num_sectors - 1) { - list_del(&sum->list); - kfree(sum); + memcpy(csum, sum->sums + index * sctx->fs_info->csum_size, + sctx->fs_info->csum_size); + + /* Cleanup the range if we're at the end of the csum range */ + if (index == num_sectors - 1) + drop_csum_range(sctx, sum); + break; } + if (!found) + return 0; return 1; } From patchwork Tue Nov 3 13:31:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECE2EC2D0A3 for ; Tue, 3 Nov 2020 13:32:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 990EE20870 for ; Tue, 3 Nov 2020 13:32:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="UcOp4oZf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729324AbgKCNcl (ORCPT ); Tue, 3 Nov 2020 08:32:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:45814 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729241AbgKCNck (ORCPT ); Tue, 3 Nov 2020 08:32:40 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410358; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h+/7r0z8p2G23BQaQ3+U+jYER+4Mb+GSLNi8+oPRbqA=; b=UcOp4oZfIgbsmflZouAnW98sWEH18M77fhOunpHY2sSxZjhCLjEm1Qf0K2Lb84WZqXe/s1 z/ZyvCaUZEpTdaLSKWKt/l+jh/YRUUEZuK94Gp7KYs+SVzmYGyuMGbrBbKA0eSku3iPrlg xsB6OfAvhiAk1r9Nro7vB6ceKka7ss0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 631E1AFAA for ; Tue, 3 Nov 2020 13:32:38 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 29/32] btrfs: scrub: introduce scrub_page::page_len for subpage support Date: Tue, 3 Nov 2020 21:31:05 +0800 Message-Id: <20201103133108.148112-30-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently scrub_page only has one csum for each page, this is fine if page size == sector size, then each page has one csum for it. But for subpage support, we could have cases where only part of the page is utilized. E.g one 4K sector is read into a 64K page. In that case, we need a way to determine which range is really utilized. This patch will introduce scrub_page::page_len so that we can know where the utilized range ends. This is especially important for subpage. As write bio can overwrite existing data if we just submit full page bio. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index e4f73dfc3516..9f380009890f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -72,9 +72,15 @@ struct scrub_page { u64 physical_for_dev_replace; atomic_t refs; struct { - unsigned int mirror_num:8; - unsigned int have_csum:1; - unsigned int io_error:1; + /* + * For subpage case, where only part of the page is utilized + * Note that 16 bits can only go 65535, not 65536, thus we have + * to use 17 bits here. + */ + u32 page_len:17; + u32 mirror_num:8; + u32 have_csum:1; + u32 io_error:1; }; struct scrub_recover *recover; @@ -216,6 +222,11 @@ static struct scrub_page *alloc_scrub_page(struct scrub_ctx *sctx, gfp_t mask) u32 sectorsize = sctx->fs_info->sectorsize; size_t size; + /* + * The bits in scrub_page::page_len only supports up to 64K page size. + */ + BUILD_BUG_ON(PAGE_SIZE > SZ_64K); + /* No support for multi-page sector size yet */ ASSERT(PAGE_SIZE >= sectorsize && IS_ALIGNED(PAGE_SIZE, sectorsize)); @@ -1357,6 +1368,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, } scrub_page_get(spage); sblock->pagev[page_index] = spage; + spage->page_len = sublen; spage->sblock = sblock; spage->flags = flags; spage->generation = generation; @@ -1450,7 +1462,7 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info, struct scrub_page *spage = sblock->pagev[page_num]; WARN_ON(!spage->page); - bio_add_page(bio, spage->page, PAGE_SIZE, 0); + bio_add_page(bio, spage->page, spage->page_len, 0); } if (scrub_submit_raid56_bio_wait(fs_info, bio, first_page)) { @@ -1503,7 +1515,7 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info, bio = btrfs_io_bio_alloc(1); bio_set_dev(bio, spage->dev->bdev); - bio_add_page(bio, spage->page, PAGE_SIZE, 0); + bio_add_page(bio, spage->page, spage->page_len, 0); bio->bi_iter.bi_sector = spage->physical >> 9; bio->bi_opf = REQ_OP_READ; @@ -1586,8 +1598,8 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad, bio->bi_iter.bi_sector = spage_bad->physical >> 9; bio->bi_opf = REQ_OP_WRITE; - ret = bio_add_page(bio, spage_good->page, PAGE_SIZE, 0); - if (PAGE_SIZE != ret) { + ret = bio_add_page(bio, spage_good->page, spage_good->page_len, 0); + if (ret != spage_good->page_len) { bio_put(bio); return -EIO; } @@ -1683,8 +1695,8 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, goto again; } - ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0); - if (ret != PAGE_SIZE) { + ret = bio_add_page(sbio->bio, spage->page, spage->page_len, 0); + if (ret != spage->page_len) { if (sbio->page_count < 1) { bio_put(sbio->bio); sbio->bio = NULL; @@ -2031,8 +2043,8 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx, } sbio->pagev[sbio->page_count] = spage; - ret = bio_add_page(sbio->bio, spage->page, PAGE_SIZE, 0); - if (ret != PAGE_SIZE) { + ret = bio_add_page(sbio->bio, spage->page, spage->page_len, 0); + if (ret != spage->page_len) { if (sbio->page_count < 1) { bio_put(sbio->bio); sbio->bio = NULL; @@ -2208,6 +2220,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK); scrub_page_get(spage); sblock->pagev[index] = spage; + spage->page_len = l; spage->sblock = sblock; spage->dev = dev; spage->flags = flags; @@ -2552,6 +2565,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, /* For scrub parity */ scrub_page_get(spage); list_add_tail(&spage->list, &sparity->spages); + spage->page_len = l; spage->sblock = sblock; spage->dev = dev; spage->flags = flags; From patchwork Tue Nov 3 13:31:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 393A0C2D0A3 for ; Tue, 3 Nov 2020 13:32:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3CC320870 for ; Tue, 3 Nov 2020 13:32:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="U9TorHGU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729413AbgKCNco (ORCPT ); Tue, 3 Nov 2020 08:32:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:45900 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729241AbgKCNcn (ORCPT ); Tue, 3 Nov 2020 08:32:43 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ctwAg40GrTIK+edjMSvgx6K0DPX8iUEnoZMyInNlE2c=; b=U9TorHGU9lSaMUyRdb32Q9l98sj71vvkvMY1hK4Kkrv4JySbsxjRmrNgtSbyhvVEXjP7DZ zCMOiz/ZSBxOi+4JbYsa25Cj7WemSwyxxTmxQyTK7LVEhlatUvotP8QhfVWIRPxit9jSAs jxrhWuHPq/GuCr8FX6V3yAan+46u36o= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 461C8ACC0 for ; Tue, 3 Nov 2020 13:32:42 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 30/32] btrfs: scrub: always allocate one full page for one sector for RAID56 Date: Tue, 3 Nov 2020 21:31:06 +0800 Message-Id: <20201103133108.148112-31-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For scrub_pages() and scrub_pages_for_parity(), we currently allocate one scrub_page structure for one page. This is fine if we only read/write one sector one time. But for cases like scrubing RAID56, we need to read/write the full stripe, which is in 64K size. For subpage size, we will submit the read in just one page, which is normally a good thing, but for RAID56 case, it only expects to see one sector, not the full stripe in its endio function. This could lead to wrong parity checksum for RAID56 on subpage. To make the existing code work well for subpage case, here we take a shortcut, by always allocating a full page for one sector. This should provide the basis to make RAID56 work for subpage case. The cost is pretty obvious now, for one RAID56 stripe now we always need 16 pages. For support subpage situation (64K page size, 4K sector size), this means we need full one megabyte to scrub just one RAID56 stripe. And for data scrub, each 4K sector will also need one 64K page. This is mostly just a workaround, the proper fix for this is a much larger project, using scrub_block to replace scrub_page, and allow scrub_block to handle multi pages, csums, and csum_bitmap to avoid allocating one page for each sector. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 9f380009890f..230ba24a4fdf 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2184,6 +2184,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 physical_for_dev_replace) { struct scrub_block *sblock; + u32 sectorsize = sctx->fs_info->sectorsize; bool force_submit = false; int index; @@ -2206,7 +2207,15 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u64 l = min_t(u64, len, PAGE_SIZE); + /* + * Here we will allocate one page for one sector to scrub. + * This is fine if PAGE_SIZE == sectorsize, but will cost + * more memory for PAGE_SIZE > sectorsize case. + * + * TODO: Make scrub_block to handle multiple pages and csums, + * so that we don't need scrub_page structure at all. + */ + u32 l = min_t(u32, sectorsize, len); spage = alloc_scrub_page(sctx, GFP_KERNEL); if (!spage) { @@ -2526,8 +2535,11 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, { struct scrub_ctx *sctx = sparity->sctx; struct scrub_block *sblock; + u32 sectorsize = sctx->fs_info->sectorsize; int index; + ASSERT(IS_ALIGNED(len, sectorsize)); + sblock = kzalloc(sizeof(*sblock), GFP_KERNEL); if (!sblock) { spin_lock(&sctx->stat_lock); @@ -2546,7 +2558,8 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u64 l = min_t(u64, len, PAGE_SIZE); + /* Check scrub_page() for the reason why we use sectorsize */ + u32 l = sectorsize; BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK); From patchwork Tue Nov 3 13:31:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1976AC388F7 for ; Tue, 3 Nov 2020 13:32:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B597A20870 for ; Tue, 3 Nov 2020 13:32:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Z9dPoVfp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729416AbgKCNcr (ORCPT ); Tue, 3 Nov 2020 08:32:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:45976 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729241AbgKCNcq (ORCPT ); Tue, 3 Nov 2020 08:32:46 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SHLxx4iiZAaFvMQ14rBBzkaKGJZ2L1cg3aKFCPU+yW4=; b=Z9dPoVfpxP8SgWczgu+0jl2wON4x/Yww7ZHL8ryQfeThbQ8QLhPrA4QlEPtHiuueGxigcz pQ3glVktd/r9qbrKCtJE1LW5CvnUFXTJ2T7nFYy7FnDb0A8XiXplJHxbQFdd7T2G9HBLdx LPF3JDNueegdln9tDzdWKyq/EIrI2u8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 665A6ACC0 for ; Tue, 3 Nov 2020 13:32:44 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 31/32] btrfs: scrub: support subpage tree block scrub Date: Tue, 3 Nov 2020 21:31:07 +0800 Message-Id: <20201103133108.148112-32-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To support subpage tree block scrub, scrub_checksum_tree_block() only needs to learn 2 new tricks: - Follow scrub_page::page_len Now scrub_page only represents one sector, we need to follow it properly. - Run checksum on all sectors Since scrub_page only represents one sector, we need to run hash on all sectors, no longer just (nodesize >> PAGE_SIZE). Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 230ba24a4fdf..deee5c9bd442 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1839,15 +1839,21 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) struct scrub_ctx *sctx = sblock->sctx; struct btrfs_header *h; struct btrfs_fs_info *fs_info = sctx->fs_info; + u32 sectorsize = sctx->fs_info->sectorsize; + u32 nodesize = sctx->fs_info->nodesize; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); u8 calculated_csum[BTRFS_CSUM_SIZE]; u8 on_disk_csum[BTRFS_CSUM_SIZE]; - const int num_pages = sctx->fs_info->nodesize >> PAGE_SHIFT; + const int num_sectors = nodesize / sectorsize; int i; struct scrub_page *spage; char *kaddr; BUG_ON(sblock->page_count < 1); + + /* Each pagev[] is in fact just one sector, not a full page */ + ASSERT(sblock->page_count == num_sectors); + spage = sblock->pagev[0]; kaddr = page_address(spage->page); h = (struct btrfs_header *)kaddr; @@ -1876,11 +1882,11 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE, - PAGE_SIZE - BTRFS_CSUM_SIZE); + spage->page_len - BTRFS_CSUM_SIZE); - for (i = 1; i < num_pages; i++) { + for (i = 1; i < num_sectors; i++) { kaddr = page_address(sblock->pagev[i]->page); - crypto_shash_update(shash, kaddr, PAGE_SIZE); + crypto_shash_update(shash, kaddr, sblock->pagev[i]->page_len); } crypto_shash_final(shash, calculated_csum); From patchwork Tue Nov 3 13:31:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11877545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD04FC388F7 for ; Tue, 3 Nov 2020 13:32:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CFE920870 for ; Tue, 3 Nov 2020 13:32:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="YBOzUGAd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729418AbgKCNcs (ORCPT ); Tue, 3 Nov 2020 08:32:48 -0500 Received: from mx2.suse.de ([195.135.220.15]:46022 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729414AbgKCNcs (ORCPT ); Tue, 3 Nov 2020 08:32:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604410366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ReCnin450+o8Y1L+TErRvh3LWhDgG9HrvPVoYKSx8Qc=; b=YBOzUGAdWcbedHdYC/RQQe7zkeZbVx0Gusgbxs/Bm9CPh0ZW4nq04QBRGNg+H2+zPL4ScS XIgmr3xSNr9BFml4KbYLQWQHW7qmLkCMeoRwxKDqtn2fgt6yWtgCPTXNb2wCyM6U7UcMDX D3/oZMgrlxR4+3rV2YibSbr3OHBPTaU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 34BA4AFB0 for ; Tue, 3 Nov 2020 13:32:46 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 32/32] btrfs: scrub: support subpage data scrub Date: Tue, 3 Nov 2020 21:31:08 +0800 Message-Id: <20201103133108.148112-33-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103133108.148112-1-wqu@suse.com> References: <20201103133108.148112-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs scrub is in fact much more flex than buffered data write path, as we can read an unaligned subpage data into page offset 0. This ability makes subpage support much easier, we just need to check each scrub_page::page_len and ensure we only calculate hash for [0, page_len) of a page, and call it a day for subpage scrub support. There is a small thing to notice, for subpage case, we still do sector by sector scrub. This means we will submit a read bio for each sector to scrub, resulting the same amount of read bios, just like the 4K page systems. This behavior can be considered as a good thing, if we want everything to be the same as 4K page systems. But this also means, we're wasting the ability to submit larger bio using 64K page size. This is another problem to consider in the future. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index deee5c9bd442..d1cbea7a6db0 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1822,15 +1822,19 @@ static int scrub_checksum_data(struct scrub_block *sblock) if (!spage->have_csum) return 0; + /* + * In scrub_pages() and scrub_pages_for_parity() we ensure + * each spage only contains just one sector of data. + */ + ASSERT(spage->page_len == sctx->fs_info->sectorsize); kaddr = page_address(spage->page); shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); - crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum); + crypto_shash_digest(shash, kaddr, spage->page_len, csum); if (memcmp(csum, spage->csums, sctx->fs_info->csum_size)) sblock->checksum_error = 1; - return sblock->checksum_error; }