From patchwork Thu Aug 4 08:10:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12936196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E30C1C19F2B for ; Thu, 4 Aug 2022 08:11:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239445AbiHDILX (ORCPT ); Thu, 4 Aug 2022 04:11:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237383AbiHDILW (ORCPT ); Thu, 4 Aug 2022 04:11:22 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F2E027B23; Thu, 4 Aug 2022 01:11:21 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C1A0C20D2B; Thu, 4 Aug 2022 08:11:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1659600679; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hmhMiItqNajn4DU26z+vCdhR3Ew1j8tljr/IseLXZtY=; b=lm7oDaot1In+bArJrob+8PdPiIihGGsjyaCEav5HKyjRsiPP7GBX4eeSoesWi8JnTlMeBG nQXErL8fsksCqiPqV2d3f5+8Ns5LcdBCV80kOhoJvVlEBGm07YvDc04NsF4d2B2JZd1AD9 xfssQ+Ner8hh8xGpoA1pXPI2BqpDEFc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A860D13AE1; Thu, 4 Aug 2022 08:11:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MCogHCZ/62L1KQAAMHmgww (envelope-from ); Thu, 04 Aug 2022 08:11:18 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: David Sterba Subject: [PATCH STABLE 5.18 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes Date: Thu, 4 Aug 2022 16:10:58 +0800 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit bd8f7e627703ca5707833d623efcd43f104c7b3f upstream. If we have only 8K partial write at the beginning of a full RAID56 stripe, we will write the following contents: 0 8K 32K 64K Disk 1 (data): |XX| | | Disk 2 (data): | | | Disk 3 (parity): |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX| |X| means the sector will be written back to disk. Note that, although we won't write any sectors from disk 2, but we will write the full 64KiB of parity to disk. This behavior is fine for now, but not for the future (especially for RAID56J, as we waste quite some space to journal the unused parity stripes). So here we will also utilize the btrfs_raid_bio::dbitmap, anytime we queue a higher level bio into an rbio, we will update rbio::dbitmap to indicate which vertical stripes we need to writeback. And at finish_rmw(), we also check dbitmap to see if we need to write any sector in the vertical stripe. So after the patch, above example will only lead to the following writeback pattern: 0 8K 32K 64K Disk 1 (data): |XX| | | Disk 2 (data): | | | Disk 3 (parity): |XX| | | Cc: stable@vger.kernel.org # 5.18 Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/raid56.c | 55 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0e239a4c3b26..ca92a20d5c27 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -323,6 +323,9 @@ static void merge_rbio(struct btrfs_raid_bio *dest, { bio_list_merge(&dest->bio_list, &victim->bio_list); dest->bio_list_bytes += victim->bio_list_bytes; + /* Also inherit the bitmaps from @victim. */ + bitmap_or(dest->dbitmap, victim->dbitmap, dest->dbitmap, + dest->stripe_npages); dest->generic_bio_cnt += victim->generic_bio_cnt; bio_list_init(&victim->bio_list); } @@ -864,6 +867,12 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err) if (rbio->generic_bio_cnt) btrfs_bio_counter_sub(rbio->bioc->fs_info, rbio->generic_bio_cnt); + /* + * Clear the data bitmap, as the rbio may be cached for later usage. + * do this before before unlock_stripe() so there will be no new bio + * for this bio. + */ + bitmap_clear(rbio->dbitmap, 0, rbio->stripe_npages); /* * At this moment, rbio->bio_list is empty, however since rbio does not @@ -1195,6 +1204,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) else BUG(); + /* We should have at least one data sector. */ + ASSERT(bitmap_weight(rbio->dbitmap, rbio->stripe_npages)); + /* at this point we either have a full stripe, * or we've read the full stripe from the drive. * recalculate the parity and write the new results. @@ -1266,6 +1278,11 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) for (stripe = 0; stripe < rbio->real_stripes; stripe++) { for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *page; + + /* This vertical stripe has no data, skip it. */ + if (!test_bit(pagenr, rbio->dbitmap)) + continue; + if (stripe < rbio->nr_data) { page = page_in_rbio(rbio, stripe, pagenr, 1); if (!page) @@ -1290,6 +1307,11 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *page; + + /* This vertical stripe has no data, skip it. */ + if (!test_bit(pagenr, rbio->dbitmap)) + continue; + if (stripe < rbio->nr_data) { page = page_in_rbio(rbio, stripe, pagenr, 1); if (!page) @@ -1713,6 +1735,33 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule) run_plug(plug); } +/* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */ +static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio) +{ + const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info; + const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT; + const u64 full_stripe_start = rbio->bioc->raid_map[0]; + const u32 orig_len = orig_bio->bi_iter.bi_size; + const u32 sectorsize = fs_info->sectorsize; + u64 cur_logical; + + ASSERT(orig_logical >= full_stripe_start && + orig_logical + orig_len <= full_stripe_start + + rbio->nr_data * rbio->stripe_len); + + bio_list_add(&rbio->bio_list, orig_bio); + rbio->bio_list_bytes += orig_bio->bi_iter.bi_size; + + /* Update the dbitmap. */ + for (cur_logical = orig_logical; cur_logical < orig_logical + orig_len; + cur_logical += sectorsize) { + int bit = ((u32)(cur_logical - full_stripe_start) >> + fs_info->sectorsize_bits) % rbio->stripe_npages; + + set_bit(bit, rbio->dbitmap); + } +} + /* * our main entry point for writes from the rest of the FS. */ @@ -1730,9 +1779,8 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, btrfs_put_bioc(bioc); return PTR_ERR(rbio); } - bio_list_add(&rbio->bio_list, bio); - rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->operation = BTRFS_RBIO_WRITE; + rbio_add_bio(rbio, bio); btrfs_bio_counter_inc_noblocked(fs_info); rbio->generic_bio_cnt = 1; @@ -2134,8 +2182,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, } rbio->operation = BTRFS_RBIO_READ_REBUILD; - bio_list_add(&rbio->bio_list, bio); - rbio->bio_list_bytes = bio->bi_iter.bi_size; + rbio_add_bio(rbio, bio); rbio->faila = find_logical_bio_stripe(rbio, bio); if (rbio->faila == -1) {