From patchwork Wed Dec 1 05:17:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DD54C433FE for ; Wed, 1 Dec 2021 05:18:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346654AbhLAFVi (ORCPT ); Wed, 1 Dec 2021 00:21:38 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37828 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231666AbhLAFVh (ORCPT ); Wed, 1 Dec 2021 00:21:37 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7B98A1FD58; Wed, 1 Dec 2021 05:18:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335896; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hF+d1OqsGjogNJdDpe0L/7qbT0oBeA/kXRnlOH9HPjc=; b=ae7emaCSA8zlyEKAmMKs+k06ZOHfwF2XoTVH8zUBKzGhvOUeTMci2g5GteAgeVOWzz4RE2 JBjjgkVD3q0th2oUbAj6gROHqoDE/Oa016wo1umRWFlwxnXpzlql1UJ56Hne4osS9pUEPK 58sx3HBptQ1hYg/ekRkaTLEdnZ02Sb0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 756D313425; Wed, 1 Dec 2021 05:18:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uPFEEZcFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:15 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 01/17] btrfs: update an stale comment on btrfs_submit_bio_hook() Date: Wed, 1 Dec 2021 13:17:40 +0800 Message-Id: <20211201051756.53742-2-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This function is renamed to btrfs_submit_data_bio(), update the comment and add extra reason why it doesn't completely follow the same rule in btrfs_submit_data_bio(). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 504cf090fc88..6079d30f83e8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8202,7 +8202,13 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; - /* Check btrfs_submit_bio_hook() for rules about async submit. */ + /* + * Check btrfs_submit_data_bio() for rules about async submit. + * + * The only exception is for RAID56, when there are more than one bios + * to submit, async submit seems to make it harder to collect csums + * for the full stripe. + */ if (async_submit) async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); From patchwork Wed Dec 1 05:17:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A94B4C433F5 for ; Wed, 1 Dec 2021 05:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346664AbhLAFVj (ORCPT ); Wed, 1 Dec 2021 00:21:39 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37834 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231666AbhLAFVi (ORCPT ); Wed, 1 Dec 2021 00:21:38 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C358B1FD2F; Wed, 1 Dec 2021 05:18:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335897; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lx7lzt2Kk/DUrxzR8la+HHM7k9uVIizWO00hxKSvzhI=; b=UC6C/vAb7VWnxzndZRhwexMj0pksCXGUoQ7zqXOTBHtAnRHSSmYMgskrXbkLkiXjILPDEU 66I8/yiiGvr+R4KT4n+JX4SzjWNhCYyeDkuT04MUFU0QZ0tI9w2ieFbWjzHt3NttMX8NkH vq8F+i26ptaFu4ENxa+raMUOnTmvDJk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C84D113425; Wed, 1 Dec 2021 05:18:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4EeAJZgFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:16 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 02/17] btrfs: save bio::bi_iter into btrfs_bio::iter before submitting Date: Wed, 1 Dec 2021 13:17:41 +0800 Message-Id: <20211201051756.53742-3-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since block layer will advance bio::bi_iter, at endio time we can no longer rely on bio::bi_iter for split bio. But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to ensure endio function is only executed for the split range, not the whole original bio. Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to save bi_iter into btrfs_bio::iter. The following call sites need this helper call: - btrfs_submit_compressed_read() For compressed read. For compressed write it doesn't really care as they use ordered extent. - raid56_parity_write() - raid56_parity_recovery() For RAID56. - submit_stripe_bio() For all other cases. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 3 +++ fs/btrfs/raid56.c | 2 ++ fs/btrfs/volumes.c | 14 ++++++++++++++ fs/btrfs/volumes.h | 18 ++++++++++++++++++ 4 files changed, 37 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index e776956d5bc9..cc8d13369f53 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -870,6 +870,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; + /* Save bi_iter so that end_bio_extent_readpage() won't freak out. */ + btrfs_bio_save_iter(btrfs_bio(bio)); + while (cur_disk_byte < disk_bytenr + compressed_len) { u64 offset = cur_disk_byte - disk_bytenr; unsigned int index = offset >> PAGE_SHIFT; diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0e239a4c3b26..13e726c88a81 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, return PTR_ERR(rbio); } bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->operation = BTRFS_RBIO_WRITE; @@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, rbio->operation = BTRFS_RBIO_READ_REBUILD; bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->faila = find_logical_bio_stripe(rbio, bio); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f38c230111be..b70037cc1a51 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6829,6 +6829,20 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, BUG(); } + /* + * At endio time, bi_iter is no longer reliable, thus we have to save + * current bi_iter into btrfs_bio so that even for split bio we can + * iterate only the split part. + * + * And this has to be done before any bioc error, as endio functions + * will rely on bbio::iter. + * + * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's + * already set, but we can still have original bio which has its + * iter not initialized. + */ + btrfs_bio_save_iter(btrfs_bio(bio)); + for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { dev = bioc->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3b8130680749..f9178d2c2fd6 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -334,6 +334,12 @@ struct btrfs_bio { struct btrfs_device *device; u8 *csum; u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + /* + * Saved bio::bi_iter before submission. + * + * This allows us to interate the cloned/split bio properly, as at + * endio time bio::bi_iter is no longer reliable. + */ struct bvec_iter iter; /* @@ -356,6 +362,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) } } +/* + * To save bbio::bio->bi_iter into bbio::iter so for callers who need the + * original bi_iter can access the original part of the bio. + * This is especially important for the incoming split btrfs_bio, which needs + * to call its endio for and only for the split range. + */ +static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio) +{ + if (!bbio->iter.bi_size) + bbio->iter = bbio->bio.bi_iter; +} + struct btrfs_io_stripe { struct btrfs_device *dev; u64 physical; From patchwork Wed Dec 1 05:17:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15027C433FE for ; Wed, 1 Dec 2021 05:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346672AbhLAFVl (ORCPT ); Wed, 1 Dec 2021 00:21:41 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55504 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231666AbhLAFVk (ORCPT ); Wed, 1 Dec 2021 00:21:40 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 23690212BA; Wed, 1 Dec 2021 05:18:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335899; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NJ5vZF1+Ea6krFc0na9UXeiRAV8wrbBoYaiSjMXBdIE=; b=FTsBjvMP1I1KyOSeYr/uYDtnZ+r0a5qVDO0krlgZvoU20PcaNlthmeBvX6Nre70hUvur1S Hzo2KNwZTPVtUUB8KElZn9M6PzHjgIjEOuQ299eM/VO0SeVEw0oGCRNxTTVD4BCKdmgAy+ zHepxlCJXJJeTBC6whZkq3EXftASkbg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 28E2413425; Wed, 1 Dec 2021 05:18:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 8LsKOpkFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:17 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 03/17] btrfs: use correct bio size for error message in btrfs_end_dio_bio() Date: Wed, 1 Dec 2021 13:17:42 +0800 Message-Id: <20211201051756.53742-4-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org At endio time, bio->bi_iter is no longer valid (there are some cases they are still valid, but never ensured). Thus if we really want to get the full size of bio, we have to iterate them. In btrfs_end_dio_bio() when we hit error, we would grab bio size from bi_iter which can be wrong. Fix it by iterating the bvecs and calculate the bio size. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6079d30f83e8..126d2117954c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8175,12 +8175,19 @@ static void btrfs_end_dio_bio(struct bio *bio) struct btrfs_dio_private *dip = bio->bi_private; blk_status_t err = bio->bi_status; - if (err) + if (err) { + struct bvec_iter_all iter_all; + struct bio_vec *bvec; + u32 bi_size = 0; + + bio_for_each_segment_all(bvec, bio, iter_all) + bi_size += bvec->bv_len; + btrfs_warn(BTRFS_I(dip->inode)->root->fs_info, "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d", btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio), - bio->bi_opf, bio->bi_iter.bi_sector, - bio->bi_iter.bi_size, err); + bio->bi_opf, bio->bi_iter.bi_sector, bi_size, err); + } if (bio_op(bio) == REQ_OP_READ) err = btrfs_check_read_dio_bio(dip, btrfs_bio(bio), !err); From patchwork Wed Dec 1 05:17:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 535F2C433EF for ; Wed, 1 Dec 2021 05:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346680AbhLAFVm (ORCPT ); Wed, 1 Dec 2021 00:21:42 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37842 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346674AbhLAFVl (ORCPT ); Wed, 1 Dec 2021 00:21:41 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 78C5B1FD2F; Wed, 1 Dec 2021 05:18:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335900; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NQmzwH9HxInIMdPF6E/LZmvFEiBVbo/8Q4w5fGEsTC8=; b=Kgei6XX5sJ0SIAovUPEaMTpRor5ixI7fdCiwFKXKKDRCayRBZSzaLnIY7+OdreH8AFUDl7 dzYZ+r95R2Rh2dKlXczwHXiMf8MMDWoW1kuFzPhZDmZpzcFtq9YEe/f1qUUH1h/raVzKI/ BZlnnREC3vrIgORQa3p9AD6qTtGu9oU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7CEF813425; Wed, 1 Dec 2021 05:18:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mFIHE5sFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:19 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 04/17] btrfs: refactor btrfs_map_bio() Date: Wed, 1 Dec 2021 13:17:43 +0800 Message-Id: <20211201051756.53742-5-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently in btrfs_map_bio() we call __btrfs_map_block(), then using the returned bioc to submit real stripes. This is fine if we're only going to handle one bio a time. For the incoming bio split at btrfs_map_bio() time, we want to handle several different bios, thus there we introduce a new helper, submit_one_mapped_range() to handle the submission part, making it much easier to make it work in a loop. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b70037cc1a51..3dc759996f55 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6777,29 +6777,15 @@ static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logic } } -blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num) +static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bio, + struct btrfs_io_context *bioc, u64 map_length, + int mirror_num) { - struct btrfs_device *dev; struct bio *first_bio = bio; - u64 logical = bio->bi_iter.bi_sector << 9; - u64 length = 0; - u64 map_length; - int ret; - int dev_nr; + u64 logical = bio->bi_iter.bi_sector << SECTOR_SHIFT; int total_devs; - struct btrfs_io_context *bioc = NULL; - - length = bio->bi_iter.bi_size; - map_length = length; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, - &map_length, &bioc, mirror_num, 1); - if (ret) { - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); - } + int dev_nr; + int ret; total_devs = bioc->num_stripes; bioc->orig_bio = first_bio; @@ -6818,14 +6804,13 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, mirror_num, 1); } - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); + return ret; } - if (map_length < length) { + if (map_length < bio->bi_iter.bi_size) { btrfs_crit(fs_info, - "mapping failed logical %llu bio len %llu len %llu", - logical, length, map_length); + "mapping failed logical %llu bio len %u len %llu", + logical, bio->bi_iter.bi_size, map_length); BUG(); } @@ -6844,6 +6829,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, btrfs_bio_save_iter(btrfs_bio(bio)); for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { + struct btrfs_device *dev; + dev = bioc->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || @@ -6860,6 +6847,34 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev); } + return 0; +} + +blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) +{ + u64 logical = bio->bi_iter.bi_sector << 9; + u64 length = 0; + u64 map_length; + int ret; + struct btrfs_io_context *bioc = NULL; + + length = bio->bi_iter.bi_size; + map_length = length; + + btrfs_bio_counter_inc_blocked(fs_info); + ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, + &map_length, &bioc, mirror_num, 1); + if (ret) { + btrfs_bio_counter_dec(fs_info); + return errno_to_blk_status(ret); + } + + ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num); + if (ret < 0) { + btrfs_bio_counter_dec(fs_info); + return errno_to_blk_status(ret); + } btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; } From patchwork Wed Dec 1 05:17:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E11EC4332F for ; Wed, 1 Dec 2021 05:18:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346685AbhLAFVn (ORCPT ); Wed, 1 Dec 2021 00:21:43 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37850 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346681AbhLAFVm (ORCPT ); Wed, 1 Dec 2021 00:21:42 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CDCDB1FD58; Wed, 1 Dec 2021 05:18:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335901; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0k5r3Ppeq7X0JPWJaz4PsLw9OcvoR9EQFZumBQyap1I=; b=ZtcBclRvynGj7aJ1bxbsXoHLouToEpEV0mhHt9BBGIU7M3mjw/DXMs1TUHM4/ooeL4TpLg OZlLJBWHrkk27wxB6mPtHC+SYpWbYJrIJKBlHhEeCkP4bc3aE1lCO/TPDcv8WEyNiiPbLR y384b8A0ZyjMixaw30Pwu8qq2MPbjZc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D23F413425; Wed, 1 Dec 2021 05:18:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sFzaJ5wFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:20 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 05/17] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Date: Wed, 1 Dec 2021 13:17:44 +0800 Message-Id: <20211201051756.53742-6-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This is a preparation patch for the incoming chunk mapping layer bio split. Function btrfs_bio_wq_end_io() is going to remap bio::bi_private and bio::bi_end_io so that the real endio function will be executed in a workqueue. The problem is, remapped bio::bi_private will be a newly allocated memory, and after the original endio executed, the memory will be freed. This will not work well with split bio. So this patch will move all btrfs_bio_wq_end_io() call into one helper function, btrfs_bio_final_endio_remap(), and call that helper in submit_stripe_bio(). This refactor also unified all data bio behaviors. Before this patch, compressed bio no matter if read or write, will always be delayed using workqueue. However all data write operations are already delayed using ordered extent, and all metadata write doesn't need any delayed execution. Thus this patch will make compressed bios follow the same data read/write behavior. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 4 +--- fs/btrfs/disk-io.c | 9 +-------- fs/btrfs/inode.c | 20 +++++--------------- fs/btrfs/volumes.c | 41 +++++++++++++++++++++++++++++++++++++---- fs/btrfs/volumes.h | 9 ++++++++- 5 files changed, 52 insertions(+), 31 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index cc8d13369f53..8668c5190805 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -429,10 +429,8 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, { blk_status_t ret; + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; ASSERT(bio->bi_iter.bi_size); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - return ret; ret = btrfs_map_bio(fs_info, bio, mirror_num); return ret; } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5c598e124c25..ce7c5a16df04 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -920,14 +920,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, blk_status_t ret; if (btrfs_op(bio) != BTRFS_MAP_WRITE) { - /* - * called for a read, do the setup so that checksum validation - * can happen in the async kernel threads - */ - ret = btrfs_bio_wq_end_io(fs_info, bio, - BTRFS_WQ_ENDIO_METADATA); - if (ret) - goto out_w_error; + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_METADATA; ret = btrfs_map_bio(fs_info, bio, mirror_num); } else if (!should_async_write(fs_info, BTRFS_I(inode))) { ret = btree_csum_one_bio(bio); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 126d2117954c..007a20a9b076 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2511,7 +2511,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; - enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA; + enum btrfs_wq_endio_type endio_type = BTRFS_WQ_ENDIO_DATA; blk_status_t ret = 0; int skip_sum; int async = !atomic_read(&BTRFS_I(inode)->sync_writers); @@ -2520,7 +2520,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state); if (btrfs_is_free_space_inode(BTRFS_I(inode))) - metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + endio_type = BTRFS_WQ_ENDIO_FREE_SPACE; if (bio_op(bio) == REQ_OP_ZONE_APPEND) { struct page *page = bio_first_bvec_all(bio)->bv_page; @@ -2532,10 +2532,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, } if (btrfs_op(bio) != BTRFS_MAP_WRITE) { - ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); - if (ret) - goto out; - + btrfs_bio(bio)->endio_type = endio_type; if (bio_flags & EXTENT_BIO_COMPRESSED) { ret = btrfs_submit_compressed_read(inode, bio, mirror_num, @@ -8090,10 +8087,6 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, BUG_ON(bio_op(bio) == REQ_OP_WRITE); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - return ret; - refcount_inc(&dip->refs); ret = btrfs_map_bio(fs_info, bio, mirror_num); if (ret) @@ -8219,11 +8212,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, if (async_submit) async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); - if (!write) { - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - goto err; - } + if (!write) + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) goto map; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 3dc759996f55..af01d54502ab 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6725,10 +6725,31 @@ static void btrfs_end_bio(struct bio *bio) } } -static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, - u64 physical, struct btrfs_device *dev) +/* + * Endio remaps which can't handle cloned bio needs to go here. + * + * Currently it's only btrfs_bio_wq_end_io(). + */ +static int btrfs_bio_final_endio_remap(struct btrfs_fs_info *fs_info, + struct bio *bio) +{ + blk_status_t sts; + + /* For write bio, we don't to put their endio into wq */ + if (btrfs_op(bio) == BTRFS_MAP_WRITE) + return 0; + + sts = btrfs_bio_wq_end_io(fs_info, bio, btrfs_bio(bio)->endio_type); + if (sts != BLK_STS_OK) + return blk_status_to_errno(sts); + return 0; +} + +static int submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, + u64 physical, struct btrfs_device *dev) { struct btrfs_fs_info *fs_info = bioc->fs_info; + int ret; bio->bi_private = bioc; btrfs_bio(bio)->device = dev; @@ -6755,9 +6776,14 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, dev->devid, bio->bi_iter.bi_size); bio_set_dev(bio, dev->bdev); - btrfs_bio_counter_inc_noblocked(fs_info); + /* Do the final endio remap if needed */ + ret = btrfs_bio_final_endio_remap(fs_info, bio); + if (ret < 0) + return ret; + btrfs_bio_counter_inc_noblocked(fs_info); btrfsic_submit_bio(bio); + return ret; } static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical) @@ -6845,9 +6871,16 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi else bio = first_bio; - submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev); + ret = submit_stripe_bio(bioc, bio, + bioc->stripes[dev_nr].physical, dev); + if (ret < 0) + goto error; } return 0; +error: + for (; dev_nr < total_devs; dev_nr++) + bioc_error(bioc, first_bio, logical); + return ret; } blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f9178d2c2fd6..f88f39042558 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -328,7 +328,14 @@ struct btrfs_fs_devices { * Mostly for btrfs specific features like csum and mirror_num. */ struct btrfs_bio { - unsigned int mirror_num; + u16 mirror_num; + + /* + * To tell which workqueue the bio's endio should be exeucted in. + * + * Only for read bios. + */ + u16 endio_type; /* @device is for stripe IO submission. */ struct btrfs_device *device; From patchwork Wed Dec 1 05:17:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B45C9C433F5 for ; Wed, 1 Dec 2021 05:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346689AbhLAFVo (ORCPT ); Wed, 1 Dec 2021 00:21:44 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55512 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346681AbhLAFVo (ORCPT ); Wed, 1 Dec 2021 00:21:44 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2E599212BA; Wed, 1 Dec 2021 05:18:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335903; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G6WnAQQM2T4EmfEGGj7c0STQC6vS6kyrJh8rvUD16WE=; b=quPtuilUj3B4P2wav3civjGt48s244DOOgkBiCSD7fgVSMg4h7zZA/UtUgTHNTg6NG8I1d 50AmiHqC9L23gAwqanJmk8so/wU4AGaWXcKhGhe6fMWX2NpoLCqioFXR71xWiugJL/kfKX K5MPM0li0SKGVA/WgyoJqn9GwNXCuPg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 31F8C13425; Wed, 1 Dec 2021 05:18:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id iIzRAJ4Fp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:22 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 06/17] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes Date: Wed, 1 Dec 2021 13:17:45 +0800 Message-Id: <20211201051756.53742-7-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This mostly follows the behavior of compressed_bio::pending_sectors. The point here is, dip::refs is not split bio friendly, as if a bio with its bi_private = dip, and the bio get split, we can easily underflow dip::refs. By using the same sector based solution as compressed_bio, dio can handle both unsplit and split bios. Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs_inode.h | 10 +++---- fs/btrfs/inode.c | 67 +++++++++++++++++++++--------------------- 2 files changed, 38 insertions(+), 39 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index b3e46aabc3d8..196f74ee102e 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -358,11 +358,11 @@ struct btrfs_dio_private { /* Used for bio::bi_size */ u32 bytes; - /* - * References to this structure. There is one reference per in-flight - * bio plus one while we're still setting up. - */ - refcount_t refs; + /* Hit any error for the whole DIO bio */ + bool errors; + + /* How many bytes are still under IO or not submitted */ + atomic_t pending_bytes; /* dio_bio came from fs/direct-io.c */ struct bio *dio_bio; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 007a20a9b076..1aa060de917c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8053,20 +8053,28 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length, return ret; } -static void btrfs_dio_private_put(struct btrfs_dio_private *dip) +static bool dec_and_test_dio_private(struct btrfs_dio_private *dip, bool error, + u32 bytes) { - /* - * This implies a barrier so that stores to dio_bio->bi_status before - * this and loads of dio_bio->bi_status after this are fully ordered. - */ - if (!refcount_dec_and_test(&dip->refs)) + ASSERT(bytes <= dip->bytes); + ASSERT(bytes <= atomic_read(&dip->pending_bytes)); + + if (error) + dip->errors = true; + return atomic_sub_and_test(bytes, &dip->pending_bytes); +} + +static void dio_private_finish(struct btrfs_dio_private *dip, bool error, + u32 bytes) +{ + if (!dec_and_test_dio_private(dip, error, bytes)) return; if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->file_offset, dip->bytes, - !dip->dio_bio->bi_status); + !dip->errors); } else { unlock_extent(&BTRFS_I(dip->inode)->io_tree, dip->file_offset, @@ -8087,10 +8095,10 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, BUG_ON(bio_op(bio) == REQ_OP_WRITE); - refcount_inc(&dip->refs); + atomic_add(bio->bi_iter.bi_size, &dip->pending_bytes); ret = btrfs_map_bio(fs_info, bio, mirror_num); if (ret) - refcount_dec(&dip->refs); + atomic_sub(bio->bi_iter.bi_size, &dip->pending_bytes); return ret; } @@ -8166,20 +8174,20 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode, static void btrfs_end_dio_bio(struct bio *bio) { struct btrfs_dio_private *dip = bio->bi_private; + struct bvec_iter iter; + struct bio_vec bvec; + u32 bi_size = 0; blk_status_t err = bio->bi_status; - if (err) { - struct bvec_iter_all iter_all; - struct bio_vec *bvec; - u32 bi_size = 0; - - bio_for_each_segment_all(bvec, bio, iter_all) - bi_size += bvec->bv_len; + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) + bi_size += bvec.bv_len; + if (err) { btrfs_warn(BTRFS_I(dip->inode)->root->fs_info, "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d", btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, bi_size, err); + dip->errors = true; } if (bio_op(bio) == REQ_OP_READ) @@ -8191,7 +8199,7 @@ static void btrfs_end_dio_bio(struct bio *bio) btrfs_record_physical_zoned(dip->inode, dip->file_offset, bio); bio_put(bio); - btrfs_dio_private_put(dip); + dio_private_finish(dip, err, bi_size); } static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, @@ -8250,7 +8258,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, */ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, - loff_t file_offset) + loff_t file_offset, + u32 length) { const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); @@ -8270,12 +8279,12 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, if (!dip) return NULL; + atomic_set(&dip->pending_bytes, length); dip->inode = inode; dip->file_offset = file_offset; dip->bytes = dio_bio->bi_iter.bi_size; dip->disk_bytenr = dio_bio->bi_iter.bi_sector << 9; dip->dio_bio = dio_bio; - refcount_set(&dip->refs, 1); return dip; } @@ -8289,6 +8298,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, BTRFS_BLOCK_GROUP_RAID56_MASK); struct btrfs_dio_private *dip; struct bio *bio; + const u32 length = dio_bio->bi_iter.bi_size; + u32 submitted_bytes = 0; u64 start_sector; int async_submit = 0; u64 submit_len; @@ -8301,7 +8312,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, struct btrfs_dio_data *dio_data = iter->iomap.private; struct extent_map *em = NULL; - dip = btrfs_create_dio_private(dio_bio, inode, file_offset); + dip = btrfs_create_dio_private(dio_bio, inode, file_offset, length); if (!dip) { if (!write) { unlock_extent(&BTRFS_I(inode)->io_tree, file_offset, @@ -8311,7 +8322,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, bio_endio(dio_bio); return; } - if (!write) { /* * Load the csums up front to reduce csum tree searches and @@ -8365,17 +8375,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, ASSERT(submit_len >= clone_len); submit_len -= clone_len; - /* - * Increase the count before we submit the bio so we know - * the end IO handler won't happen before we increase the - * count. Otherwise, the dip might get freed before we're - * done setting it up. - * - * We transfer the initial reference to the last bio, so we - * don't need to increment the reference count for the last one. - */ if (submit_len > 0) { - refcount_inc(&dip->refs); /* * If we are submitting more than one bio, submit them * all asynchronously. The exception is RAID 5 or 6, as @@ -8390,11 +8390,10 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, async_submit); if (status) { bio_put(bio); - if (submit_len > 0) - refcount_dec(&dip->refs); goto out_err_em; } + submitted_bytes += clone_len; dio_data->submitted += clone_len; clone_offset += clone_len; start_sector += clone_len >> 9; @@ -8408,7 +8407,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, free_extent_map(em); out_err: dip->dio_bio->bi_status = status; - btrfs_dio_private_put(dip); + dio_private_finish(dip, status, length - submitted_bytes); } const struct iomap_ops btrfs_dio_iomap_ops = { From patchwork Wed Dec 1 05:17:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA1FCC433F5 for ; Wed, 1 Dec 2021 05:18:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346687AbhLAFVq (ORCPT ); Wed, 1 Dec 2021 00:21:46 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55520 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346691AbhLAFVp (ORCPT ); Wed, 1 Dec 2021 00:21:45 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 851C2212BC; Wed, 1 Dec 2021 05:18:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335904; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZcP/wV7AYhhX+nSq875ds9inUWH5cEaDMVx7lYnhDk0=; b=ON5ChSRvXnTIzW32TV93veaH6wHwRb03dHNxi/S3BkxATrB7ikCjYz2TTOs5a3202xrjV0 yUKryy8wMtGJ8XJNgY/tJ+DuU/1QvLmdC6L4/j20J1ss5CrI+AgewwcPvMKgdb2MKS4WAr V5M3GXg3K5GfLaSLcr/YaEDEUWP1gC4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8801B13425; Wed, 1 Dec 2021 05:18:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mLq6FZ8Fp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:23 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 07/17] btrfs: introduce btrfs_bio_split() helper Date: Wed, 1 Dec 2021 13:17:46 +0800 Message-Id: <20211201051756.53742-8-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This new function will handle the split of a btrfs bio, to co-operate with the incoming chunk mapping time bio split. This patch will introduce the following new members and functions: - btrfs_bio::offset_to_original Since btrfs_bio::csum is still storing the checksum for the original logical bytenr, we need to know the offset between current advanced bio and the original logical bytenr. Thus here we need such new member. And the new member will fit into the existing hole between btrfs_bio::mirror_num and btrfs_bio::device, it should not increase the memory usage of btrfs_bio. - btrfs_bio::parent and btrfs_bio::orig_endio To record where the parent bio is and the original endio function. - btrfs_bio::is_split_bio To distinguish bio created by btrfs_bio_split() and btrfs_bio_clone*(). For cloned bio, they still have their csum pointed to correct memory, while split bio must rely on its parent bbio to grab csum pointer. - split_bio_endio() Just to call the original endio function then call bio_endio() on the original bio. This will ensure the original bio is freed after all cloned bio. - btrfs_split_bio() Split the original bio into two, the behavior is pretty much the same as bio_split(), just with extra btrfs specific setup. Currently there is no other caller utilizing above new members/functions yet. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 82 +++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 2 ++ fs/btrfs/volumes.h | 43 +++++++++++++++++++++-- 3 files changed, 123 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1a67f4b3986b..083700621b9f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3005,7 +3005,6 @@ static void end_bio_extent_readpage(struct bio *bio) int ret; struct bvec_iter_all iter_all; - ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { bool uptodate = !bio->bi_status; struct page *page = bvec->bv_page; @@ -3184,6 +3183,87 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size) return bio; } +/* + * A very simple wrapper to call original endio function and then + * call bio_endio() on the parent bio to decrease its bi_remaining count. + */ +static void split_bio_endio(struct bio *bio) +{ + struct btrfs_bio *bbio = btrfs_bio(bio); + /* After endio bbio could be freed, thus grab the info before endio */ + struct bio *parent = bbio->parent; + + /* + * BIO_CLONED can even be set for our parent bio (DIO use clones + * the initial bio, then uses the cloned one for IO). + * So here we don't check BIO_CLONED for parent. + */ + ASSERT(bio_flagged(bio, BIO_CLONED) && bbio->is_split_bio); + ASSERT(parent && !btrfs_bio(parent)->is_split_bio); + + bio->bi_end_io = bbio->orig_endio; + bio_endio(bio); + bio_endio(parent); +} + +/* + * Pretty much like bio_split(), caller needs to ensure @src is not freed + * before the newly allocated bio, as the new bio is relying on @src for + * its bvecs. + */ +struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info, + struct bio *src, unsigned int bytes) +{ + struct bio *new; + struct btrfs_bio *src_bbio = btrfs_bio(src); + struct btrfs_bio *new_bbio; + const unsigned int old_offset = src_bbio->offset_to_original; + + /* Src should not be split */ + ASSERT(!src_bbio->is_split_bio); + ASSERT(IS_ALIGNED(bytes, fs_info->sectorsize)); + ASSERT(bytes < src->bi_iter.bi_size); + + /* + * We're in fact chaining the new bio to the parent, but we still want + * to have independent bi_private/bi_endio, thus we need to manually + * increase the remaining for the source, just like bio_chain(). + */ + bio_inc_remaining(src); + + /* Bioset backed split should not fail */ + new = bio_split(src, bytes >> SECTOR_SHIFT, GFP_NOFS, &btrfs_bioset); + new_bbio = btrfs_bio(new); + new_bbio->offset_to_original = old_offset; + new_bbio->iter = new->bi_iter; + new_bbio->orig_endio = src->bi_end_io; + new_bbio->parent = src; + new_bbio->endio_type = src_bbio->endio_type; + new_bbio->is_split_bio = 1; + new->bi_end_io = split_bio_endio; + + /* + * This is very tricky, as if any endio has extra refcount on + * bi_private, we will be screwed up. + * + * We workaround this hacky behavior by reviewing all the involved + * endio stacks. Making sure only split-safe endio remap are called. + * + * Split-unsafe endio remap like btrfs_bio_wq_end_io() will be called + * after btrfs_bio_split(). + */ + new->bi_private = src->bi_private; + + src_bbio->offset_to_original += bytes; + + /* + * For direct IO, @src is a cloned bio thus bbio::iter still points to + * the full bio. Need to update it too. + */ + src_bbio->iter = src->bi_iter; + return new; +} + /** * Attempt to add a page to bio * diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 0399cf8e3c32..cb727b77ecda 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -280,6 +280,8 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct bio *btrfs_bio_alloc(unsigned int nr_iovecs); struct bio *btrfs_bio_clone(struct bio *bio); struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size); +struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info, + struct bio *src, unsigned int bytes); void end_extent_writepage(struct page *page, int err, u64 start, u64 end); int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f88f39042558..bd789544268c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -332,15 +332,52 @@ struct btrfs_bio { /* * To tell which workqueue the bio's endio should be exeucted in. + * This member is to make sure btrfs_bio_wq_end_io() is the last + * endio remap in the stack. * * Only for read bios. */ - u16 endio_type; + u8 endio_type; + + /* + * To tell if this btrfs bio is split or just cloned. + * Both btrfs_bio_clone*() and btrfs_bio_split() will make bbio->bio + * to have BIO_CLONED flag. + * But cloned bio still has its bbio::csum pointed to correct memory, + * unlike split bio relies on its parent bbio to grab csum. + * + * Thus we needs this extra flag to distinguish those cloned bio. + */ + u8 is_split_bio; + + /* + * Records the offset we're from the original bio. + * + * Since btrfs_bio can be split, but our csum is alwasy for the + * original logical bytenr, we need a way to know the bytes offset + * from the original logical bytenr to do proper csum verification. + */ + unsigned int offset_to_original; /* @device is for stripe IO submission. */ struct btrfs_device *device; - u8 *csum; - u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + + union { + /* + * For the parent bio recording the csum for the original + * logical bytenr + */ + struct { + u8 *csum; + u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + }; + + /* For child (split) bio to record where its parent is */ + struct { + struct bio *parent; + bio_end_io_t *orig_endio; + }; + }; /* * Saved bio::bi_iter before submission. * From patchwork Wed Dec 1 05:17:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C0E1C433FE for ; Wed, 1 Dec 2021 05:18:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346681AbhLAFVs (ORCPT ); Wed, 1 Dec 2021 00:21:48 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55528 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346674AbhLAFVr (ORCPT ); Wed, 1 Dec 2021 00:21:47 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DBB14212BA; Wed, 1 Dec 2021 05:18:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335905; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oIq5Wlg49F7ossiz/x4hZwHfw8NiGLRtkthYXsKRAUI=; b=RUutWjhmw4011/0N6rXEIxNQe8j28SAHrL7aUcI1EBbB+2YtZiz11gGi9fILZjdvW/a5h5 meEudKitdMeGsspJN+dCBtZDzRz/eWKG//uxEsxBIONm2GjbBlud3A45WzZJhnv3DjIqfe 7j1xY/+MZzH64c1tYX6bg96XZMuMhOA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DEB8613425; Wed, 1 Dec 2021 05:18:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0CDfKqAFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:24 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 08/17] btrfs: make data buffered read path to handle split bio properly Date: Wed, 1 Dec 2021 13:17:47 +0800 Message-Id: <20211201051756.53742-9-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This involves the following modifications: - Use bio_for_each_segment() instead of bio_for_each_segment_all() bio_for_each_segment_all() will iterate all bvecs, even if they are not referred by current bi_iter. *_all() variant can only be used if the bio is never split. Change it to __bio_for_each_segment() call so we won't have endio called on the same range by both split and parent bios, and it can handle both split and unsplit bios. - Make check_data_csum() to take bbio->offset_to_original into consideration Since btrfs bio can be split now, split/original bio can all start with some offset to the original logical bytenr. Take btrfs_bio::offset_to_original into consideration to get correct checksum offset. - Remove the BIO_CLONED ASSERT() in submit_read_repair() Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 34 +++++++++++++++++++--------------- fs/btrfs/inode.c | 23 +++++++++++++++++++++-- fs/btrfs/volumes.h | 3 ++- 3 files changed, 42 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 083700621b9f..e0e072c0e5c8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2735,10 +2735,9 @@ static blk_status_t submit_read_repair(struct inode *inode, ASSERT(error_bitmap); /* - * We only get called on buffered IO, thus page must be mapped and bio - * must not be cloned. - */ - ASSERT(page->mapping && !bio_flagged(failed_bio, BIO_CLONED)); + * We only get called on buffered IO, thus page must be mapped + */ + ASSERT(page->mapping); /* Iterate through all the sectors in the range */ for (i = 0; i < nr_bits; i++) { @@ -2992,7 +2991,8 @@ static struct extent_buffer *find_extent_buffer_readpage( */ static void end_bio_extent_readpage(struct bio *bio) { - struct bio_vec *bvec; + struct bio_vec bvec; + struct bvec_iter iter; struct btrfs_bio *bbio = btrfs_bio(bio); struct extent_io_tree *tree, *failure_tree; struct processed_extent processed = { 0 }; @@ -3003,11 +3003,15 @@ static void end_bio_extent_readpage(struct bio *bio) u32 bio_offset = 0; int mirror; int ret; - struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, iter_all) { + /* + * We should have saved the orignal bi_iter, and then start iterating + * using that saved iter, as at endio time bi_iter is not reliable. + */ + ASSERT(bbio->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, bbio->iter) { bool uptodate = !bio->bi_status; - struct page *page = bvec->bv_page; + struct page *page = bvec.bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; @@ -3030,19 +3034,19 @@ static void end_bio_extent_readpage(struct bio *bio) * for unaligned offsets, and an error if they don't add up to * a full sector. */ - if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + if (!IS_ALIGNED(bvec.bv_offset, sectorsize)) btrfs_err(fs_info, "partial page read in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len, + bvec.bv_offset, bvec.bv_len); + else if (!IS_ALIGNED(bvec.bv_offset + bvec.bv_len, sectorsize)) btrfs_info(fs_info, "incomplete page read with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); + bvec.bv_offset, bvec.bv_len); - start = page_offset(page) + bvec->bv_offset; - end = start + bvec->bv_len - 1; - len = bvec->bv_len; + start = page_offset(page) + bvec.bv_offset; + end = start + bvec.bv_len - 1; + len = bvec.bv_len; mirror = bbio->mirror_num; if (likely(uptodate)) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1aa060de917c..186304c69900 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3225,6 +3225,24 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, finish_ordered_fn, uptodate); } +static u8 *bbio_get_real_csum(struct btrfs_fs_info *fs_info, + struct btrfs_bio *bbio) +{ + u8 *ret; + + /* Split bbio needs to grab csum from its parent */ + if (bbio->is_split_bio) + ret = btrfs_bio(bbio->parent)->csum; + else + ret = bbio->csum; + + if (ret == NULL) + return ret; + + return ret + (bbio->offset_to_original >> fs_info->sectorsize_bits) * + fs_info->csum_size; +} + /* * check_data_csum - verify checksum of one sector of uncompressed data * @inode: inode @@ -3252,7 +3270,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio, ASSERT(pgoff + len <= PAGE_SIZE); offset_sectors = bio_offset >> fs_info->sectorsize_bits; - csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size; + csum_expected = bbio_get_real_csum(fs_info, bbio) + + offset_sectors * csum_size; kaddr = kmap_atomic(page); shash->tfm = fs_info->csum_shash; @@ -3310,7 +3329,7 @@ unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio, * Normally this should be covered by above check for compressed read * or the next check for NODATASUM. Just do a quicker exit here. */ - if (bbio->csum == NULL) + if (bbio_get_real_csum(fs_info, bbio) == NULL) return 0; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index bd789544268c..8825a17d0620 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -400,7 +400,8 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio) static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) { - if (bbio->csum != bbio->csum_inline) { + /* Only free the csum if we're not a split bio */ + if (!bbio->is_split_bio && bbio->csum != bbio->csum_inline) { kfree(bbio->csum); bbio->csum = NULL; } From patchwork Wed Dec 1 05:17:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 266F9C433EF for ; Wed, 1 Dec 2021 05:18:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346706AbhLAFVt (ORCPT ); Wed, 1 Dec 2021 00:21:49 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37858 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346688AbhLAFVs (ORCPT ); Wed, 1 Dec 2021 00:21:48 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3BFBE1FD2F; Wed, 1 Dec 2021 05:18:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335907; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QwcduZfzDSQolWq/Gfr/lsvLgDsfGVBTZ2AcqAJ/b1E=; b=Syn5cgfRVr4k8eICuzwQdTZ5qz9P6Y0zkT9Zc4oVdVY4pxBXSXjUpQ6lyyDVWVON5Cw7yQ KlCooDDIx6M0GXXUkMMexMxdQxdYRNn15KWis/SFTiAC5aNFCwbIfp+14NUzWs4RNTYuEc J7O0cMBUZoTpsIPaWg99HR6UweGjgFc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4081813425; Wed, 1 Dec 2021 05:18:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sLU7BKIFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:26 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 09/17] btrfs: make data buffered write endio function to be split bio compatible Date: Wed, 1 Dec 2021 13:17:48 +0800 Message-Id: <20211201051756.53742-10-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Only need to change the bio_for_each_segment_all() call to __bio_for_each_segment() call, and using btrfs_bio::iter as the initial bi_iter. Now the endio function can handle both split and unsplit bios well. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e0e072c0e5c8..524a00ba0ca0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2827,31 +2827,31 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end) static void end_bio_extent_writepage(struct bio *bio) { int error = blk_status_to_errno(bio->bi_status); - struct bio_vec *bvec; + struct bio_vec bvec; + struct bvec_iter iter; u64 start; u64 end; - struct bvec_iter_all iter_all; bool first_bvec = true; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; /* Our read/write should always be sector aligned. */ - if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + if (!IS_ALIGNED(bvec.bv_offset, sectorsize)) btrfs_err(fs_info, "partial page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else if (!IS_ALIGNED(bvec->bv_len, sectorsize)) + bvec.bv_offset, bvec.bv_len); + else if (!IS_ALIGNED(bvec.bv_len, sectorsize)) btrfs_info(fs_info, "incomplete page write with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); + bvec.bv_offset, bvec.bv_len); - start = page_offset(page) + bvec->bv_offset; - end = start + bvec->bv_len - 1; + start = page_offset(page) + bvec.bv_offset; + end = start + bvec.bv_len - 1; if (first_bvec) { btrfs_record_physical_zoned(inode, start, bio); @@ -2860,7 +2860,7 @@ static void end_bio_extent_writepage(struct bio *bio) end_extent_writepage(page, error, start, end); - btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len); + btrfs_page_clear_writeback(fs_info, page, start, bvec.bv_len); } bio_put(bio); From patchwork Wed Dec 1 05:17:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16C67C433EF for ; Wed, 1 Dec 2021 05:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346674AbhLAFVu (ORCPT ); Wed, 1 Dec 2021 00:21:50 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55542 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346702AbhLAFVt (ORCPT ); Wed, 1 Dec 2021 00:21:49 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 909DF212BA; Wed, 1 Dec 2021 05:18:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335908; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r3+CDf5Cu2CsN2hVI61SSdp7uQFj+vAjRoqJFHdCsOI=; b=Na4uAiEc7Pkhm15oYDK9ms+6ALpluDfv5gBYcxEMh9+58WrjH/41yY14JU0o5f/5my4ZyW KRWpgXe/P37B/hjd5eq4je75K/K12+FNTg1+wfEbLL/5oBqCD9beUZbiY8wmfmtOVnvcFp LIf7nllL7uBIx0hI2KgUkQcjXMyQGZk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 94D1913425; Wed, 1 Dec 2021 05:18:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mL3mGKMFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 10/17] btrfs: make metadata write endio functions to be split bio compatible Date: Wed, 1 Dec 2021 13:17:49 +0800 Message-Id: <20211201051756.53742-11-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Only need to convert the bio_for_each_segment_all() call into __bio_for_each_segment() call and using btrfs_bio::iter as the initial iterator. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 524a00ba0ca0..8f5a1059a296 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4495,20 +4495,20 @@ static struct extent_buffer *find_extent_buffer_nolock( static void end_bio_subpage_eb_writepage(struct bio *bio) { struct btrfs_fs_info *fs_info; - struct bio_vec *bvec; - struct bvec_iter_all iter_all; + struct bvec_iter iter; + struct bio_vec bvec; fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb); ASSERT(fs_info->sectorsize < PAGE_SIZE); - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; - u64 bvec_start = page_offset(page) + bvec->bv_offset; - u64 bvec_end = bvec_start + bvec->bv_len - 1; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; + u64 bvec_start = page_offset(page) + bvec.bv_offset; + u64 bvec_end = bvec_start + bvec.bv_len - 1; u64 cur_bytenr = bvec_start; - ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize)); + ASSERT(IS_ALIGNED(bvec.bv_len, fs_info->nodesize)); /* Iterate through all extent buffers in the range */ while (cur_bytenr <= bvec_end) { @@ -4551,14 +4551,14 @@ static void end_bio_subpage_eb_writepage(struct bio *bio) static void end_bio_extent_buffer_writepage(struct bio *bio) { - struct bio_vec *bvec; struct extent_buffer *eb; + struct bvec_iter iter; + struct bio_vec bvec; int done; - struct bvec_iter_all iter_all; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; eb = (struct extent_buffer *)page->private; BUG_ON(!eb); From patchwork Wed Dec 1 05:17:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D23C4332F for ; Wed, 1 Dec 2021 05:18:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346737AbhLAFVx (ORCPT ); Wed, 1 Dec 2021 00:21:53 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37866 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346697AbhLAFVv (ORCPT ); Wed, 1 Dec 2021 00:21:51 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E34641FD2F; Wed, 1 Dec 2021 05:18:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335909; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dWbllxqA0p40YAuwzUYnjHSoOkwp7My1ysAjiCYQxWs=; b=kLLxiaH5YXlp47M+bx0d4mfu6E6m2uRXyr2JAKHmaW6xNCC44cuwVB1xTmT5fyNA9UEZoK ZnYG+8AVX8Uy7hQBZKRVFMZDmfLN+aoFG+vF2Efdy+bOBc4k4pmC5oBSXNpH4rERACrW0X EEe4+vYwyDaUMXEFyCM3NpDZCeZ11ig= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E8D0F13425; Wed, 1 Dec 2021 05:18:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mJNvLaQFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 11/17] btrfs: make dec_and_test_compressed_bio() to be split bio compatible Date: Wed, 1 Dec 2021 13:17:50 +0800 Message-Id: <20211201051756.53742-12-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For compression read write endio functions, they all rely on dec_and_test_compressed_bio() to determine if they are the last bio. So here we only need to convert the bio_for_each_segment_all() call into __bio_for_each_segment() so that compression read/write endio functions will handle both split and unsplit bios well. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 8668c5190805..8b4b84b59b0c 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -205,18 +205,14 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio, static bool dec_and_test_compressed_bio(struct compressed_bio *cb, struct bio *bio) { struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + struct bio_vec bvec; + struct bvec_iter iter; unsigned int bi_size = 0; bool last_io = false; - struct bio_vec *bvec; - struct bvec_iter_all iter_all; - /* - * At endio time, bi_iter.bi_size doesn't represent the real bio size. - * Thus here we have to iterate through all segments to grab correct - * bio size. - */ - bio_for_each_segment_all(bvec, bio, iter_all) - bi_size += bvec->bv_len; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) + bi_size += bvec.bv_len; if (bio->bi_status) cb->errors = 1; From patchwork Wed Dec 1 05:17:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 274B7C433FE for ; Wed, 1 Dec 2021 05:18:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346697AbhLAFVz (ORCPT ); Wed, 1 Dec 2021 00:21:55 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37874 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346723AbhLAFVx (ORCPT ); Wed, 1 Dec 2021 00:21:53 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 452581FD58; Wed, 1 Dec 2021 05:18:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335911; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2J28J0cOd7X+UOVA+D9TobKFeLe4yoENDS7sHtQWCCI=; b=dyfnZeBtpwbtPhzpkjbYnzlfNXERm0ZBRZybt7qhFvLYbm5P0Sn4xlhTVgGdCdWYqFe1Ar tmTy8jrTmdyrHmDPmbD63nbJx/e2cLsOjJaXzRYj8PyeL0m5hMU2Zknrl/LddTN8Nr/YNr j6TrbcV55cPQjeg5GfcpTqTZpKnxoGE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 494C113425; Wed, 1 Dec 2021 05:18:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gBN1BqYFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:30 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 12/17] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() Date: Wed, 1 Dec 2021 13:17:51 +0800 Message-Id: <20211201051756.53742-13-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For profiles other than RAID56, __btrfs_map_block() returns @map_length as min(stripe_end, logical + *length), which is also the same result from btrfs_get_io_geometry(). But for RAID56, __btrfs_map_block() returns @map_length as stripe_len. This strange behavior is going to hurt incoming bio split at btrfs_map_bio() time, as we will use @map_length as bio split size. Fix this behavior by: - Return @map_length by the same calculatioin as other profiles - Save stripe_len into btrfs_io_context - Pass btrfs_io_context::stripe_len to raid56_*() functions - Update raid56_*() functions to make its stripe_len parameter more explicit - Add extra ASSERT()s to make sure the passed stripe_len is correct Signed-off-by: Qu Wenruo --- fs/btrfs/raid56.c | 12 ++++++++++-- fs/btrfs/raid56.h | 2 +- fs/btrfs/scrub.c | 4 ++-- fs/btrfs/volumes.c | 13 ++++++++++--- fs/btrfs/volumes.h | 1 + 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 13e726c88a81..d35cfd750b76 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -969,6 +969,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info, int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE); void *p; + ASSERT(stripe_len == BTRFS_STRIPE_LEN); + rbio = kzalloc(sizeof(*rbio) + sizeof(*rbio->stripe_pages) * num_pages + sizeof(*rbio->bio_pages) * num_pages + @@ -1725,6 +1727,9 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, struct blk_plug_cb *cb; int ret; + /* Currently we only support fixed stripe len */ + ASSERT(stripe_len == BTRFS_STRIPE_LEN); + rbio = alloc_rbio(fs_info, bioc, stripe_len); if (IS_ERR(rbio)) { btrfs_put_bioc(bioc); @@ -2122,6 +2127,9 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, struct btrfs_raid_bio *rbio; int ret; + /* Currently we only support fixed stripe len */ + ASSERT(stripe_len == BTRFS_STRIPE_LEN); + if (generic_io) { ASSERT(bioc->mirror_num == mirror_num); btrfs_bio(bio)->mirror_num = mirror_num; @@ -2671,12 +2679,12 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio) struct btrfs_raid_bio * raid56_alloc_missing_rbio(struct bio *bio, struct btrfs_io_context *bioc, - u64 length) + u64 stripe_len) { struct btrfs_fs_info *fs_info = bioc->fs_info; struct btrfs_raid_bio *rbio; - rbio = alloc_rbio(fs_info, bioc, length); + rbio = alloc_rbio(fs_info, bioc, stripe_len); if (IS_ERR(rbio)) return NULL; diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h index 72c00fc284b5..7322dcae4498 100644 --- a/fs/btrfs/raid56.h +++ b/fs/btrfs/raid56.h @@ -46,7 +46,7 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio); struct btrfs_raid_bio * raid56_alloc_missing_rbio(struct bio *bio, struct btrfs_io_context *bioc, - u64 length); + u64 stripe_len); void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio); int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 15a123e67108..58c7e8fcdeb1 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2230,7 +2230,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock) bio->bi_private = sblock; bio->bi_end_io = scrub_missing_raid56_end_io; - rbio = raid56_alloc_missing_rbio(bio, bioc, length); + rbio = raid56_alloc_missing_rbio(bio, bioc, bioc->stripe_len); if (!rbio) goto rbio_out; @@ -2846,7 +2846,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity) bio->bi_private = sparity; bio->bi_end_io = scrub_parity_bio_endio; - rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, length, + rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, bioc->stripe_len, sparity->scrub_dev, sparity->dbitmap, sparity->nsectors); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index af01d54502ab..365e43bbfd14 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6051,6 +6051,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info, ret = -ENOMEM; goto out; } + bioc->stripe_len = map->stripe_len; for (i = 0; i < num_stripes; i++) { bioc->stripes[i].physical = @@ -6406,6 +6407,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, { struct extent_map *em; struct map_lookup *map; + const u64 orig_length = *length; u64 stripe_offset; u64 stripe_nr; u64 stripe_len; @@ -6427,6 +6429,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, ASSERT(bioc_ret); ASSERT(op != BTRFS_MAP_DISCARD); + ASSERT(orig_length); em = btrfs_get_chunk_map(fs_info, logical, *length); ASSERT(!IS_ERR(em)); @@ -6522,7 +6525,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, num_stripes = map->num_stripes; max_errors = nr_parity_stripes(map); - *length = map->stripe_len; + /* Return the length to the full stripe end */ + *length = min(raid56_full_stripe_start + em->start + + data_stripes * stripe_len, + logical + orig_length) - logical; stripe_index = 0; stripe_offset = 0; } else { @@ -6574,6 +6580,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, ret = -ENOMEM; goto out; } + bioc->stripe_len = map->stripe_len; for (i = 0; i < num_stripes; i++) { bioc->stripes[i].physical = map->stripes[stripe_index].physical + @@ -6824,9 +6831,9 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi /* In this case, map_length has been set to the length of a single stripe; not the whole write */ if (btrfs_op(bio) == BTRFS_MAP_WRITE) { - ret = raid56_parity_write(bio, bioc, map_length); + ret = raid56_parity_write(bio, bioc, bioc->stripe_len); } else { - ret = raid56_parity_recover(bio, bioc, map_length, + ret = raid56_parity_recover(bio, bioc, bioc->stripe_len, mirror_num, 1); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8825a17d0620..b2dbf895b4e3 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -450,6 +450,7 @@ struct btrfs_io_context { struct bio *orig_bio; void *private; atomic_t error; + u32 stripe_len; int max_errors; int num_stripes; int mirror_num; From patchwork Wed Dec 1 05:17:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE6BC433EF for ; Wed, 1 Dec 2021 05:18:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346718AbhLAFV4 (ORCPT ); Wed, 1 Dec 2021 00:21:56 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55552 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346733AbhLAFVx (ORCPT ); Wed, 1 Dec 2021 00:21:53 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9A3BC212BA; Wed, 1 Dec 2021 05:18:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=flERs7nhTa/P7wUvlx6DDq+T93mnM4se6y7QL2Nnw5g=; b=ZMjexltloxaqmPG3Bx4WLK3jS2RXd1dY0KNeViIs7YjNYH2ggL4zqpFHv8j13d7JmYf5iv QsMsW5bmVYhGHV5xLjmN4xpy1IG8Q4EBFNMAI5ptdaIm6nDOxdJoVVbPADt2VYlQfIoAkE O5kHZNOetz2yT0dzDXt2tyuLq4Zr8xY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9E0A013425; Wed, 1 Dec 2021 05:18:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OGsUG6cFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:31 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 13/17] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Date: Wed, 1 Dec 2021 13:17:52 +0800 Message-Id: <20211201051756.53742-14-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org With the new btrfs_bio_split() helper, we are able to split bio according to chunk stripe boundaries at btrfs_map_bio() time. Although currently due bio split at buffered/compressed/direct IO time, this ability is not yet utilized. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 48 ++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 365e43bbfd14..35ba1ea95295 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6893,29 +6893,43 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) { - u64 logical = bio->bi_iter.bi_sector << 9; - u64 length = 0; - u64 map_length; + const u64 orig_logical = bio->bi_iter.bi_sector << SECTOR_SHIFT; + const unsigned int orig_length = bio->bi_iter.bi_size; + const enum btrfs_map_op op = btrfs_op(bio); + u64 cur_logical = orig_logical; int ret; - struct btrfs_io_context *bioc = NULL; - length = bio->bi_iter.bi_size; - map_length = length; + while (cur_logical < orig_logical + orig_length) { + u64 map_length = orig_logical + orig_length - cur_logical; + struct btrfs_io_context *bioc = NULL; + struct bio *cur_bio; - btrfs_bio_counter_inc_blocked(fs_info); - ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, - &map_length, &bioc, mirror_num, 1); - if (ret) { - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); - } + ret = __btrfs_map_block(fs_info, op, cur_logical, &map_length, + &bioc, mirror_num, 1); + if (ret) + return errno_to_blk_status(ret); - ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num); - if (ret < 0) { + if (cur_logical + map_length < orig_logical + orig_length) { + /* + * For now zoned write should never cross stripe + * boundary + */ + ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND); + + /* Split the bio */ + cur_bio = btrfs_bio_split(fs_info, bio, map_length); + } else { + /* Use the existing bio directly */ + cur_bio = bio; + } + btrfs_bio_counter_inc_blocked(fs_info); + ret = submit_one_mapped_range(fs_info, cur_bio, bioc, + map_length, mirror_num); btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); + if (ret < 0) + return errno_to_blk_status(ret); + cur_logical += map_length; } - btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; } From patchwork Wed Dec 1 05:17:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C367C4332F for ; Wed, 1 Dec 2021 05:18:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346726AbhLAFV5 (ORCPT ); Wed, 1 Dec 2021 00:21:57 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37882 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346739AbhLAFVz (ORCPT ); Wed, 1 Dec 2021 00:21:55 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id EF18C1FD2F; Wed, 1 Dec 2021 05:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335913; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iGoATI8QhpJQFUqpKLovdoPmLhCegyDIShiT0EoICN4=; b=HBk8iT1VcldRuyrVjDbwSVrmVOzKIq67TvNOSoVD8IKy2iMJIMDQ9UiZdDnS44OJn9/WmL Um9JqtEzAOi6GoMM7FYJ16WKNKm/Fv1un9IZfnoumjfxN9ZaC+ft5YW/kRp085qD3QWasY DQIDxVuJSMPLKLsf52bSIpRuFntVzIA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F2D0513425; Wed, 1 Dec 2021 05:18:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4HzNL6gFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 14/17] btrfs: remove buffered IO stripe boundary calculation Date: Wed, 1 Dec 2021 13:17:53 +0800 Message-Id: <20211201051756.53742-15-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This will remove btrfs_bio_ctrl::len_to_stripe_boundary, so that buffer IO will no longer limits its bio size according to stripe length. This will move the bio split to btrfs_map_bio() for all buffered IO. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f5a1059a296..7fabf46312dd 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3301,7 +3301,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, ASSERT(bio); /* The limit should be calculated when bio_ctrl->bio is allocated */ - ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary); + ASSERT(bio_ctrl->len_to_oe_boundary); if (bio_ctrl->bio_flags != bio_flags) return 0; @@ -3312,9 +3312,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, if (!contig) return 0; - real_size = min(bio_ctrl->len_to_oe_boundary, - bio_ctrl->len_to_stripe_boundary) - bio_size; - real_size = min(real_size, size); + real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size); /* * If real_size is 0, never call bio_add_*_page(), as even size is 0, @@ -3335,11 +3333,8 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, struct btrfs_inode *inode, u64 file_offset) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_io_geometry geom; struct btrfs_ordered_extent *ordered; - struct extent_map *em; u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT); - int ret; /* * Pages for compressed extent are never submitted to disk directly, @@ -3350,22 +3345,8 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, */ if (bio_ctrl->bio_flags & EXTENT_BIO_COMPRESSED) { bio_ctrl->len_to_oe_boundary = U32_MAX; - bio_ctrl->len_to_stripe_boundary = U32_MAX; return 0; } - em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize); - if (IS_ERR(em)) - return PTR_ERR(em); - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio), - logical, &geom); - free_extent_map(em); - if (ret < 0) { - return ret; - } - if (geom.len > U32_MAX) - bio_ctrl->len_to_stripe_boundary = U32_MAX; - else - bio_ctrl->len_to_stripe_boundary = (u32)geom.len; if (!btrfs_is_zoned(fs_info) || bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) { From patchwork Wed Dec 1 05:17:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87F7BC433FE for ; Wed, 1 Dec 2021 05:18:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346707AbhLAFV6 (ORCPT ); Wed, 1 Dec 2021 00:21:58 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37890 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346714AbhLAFV4 (ORCPT ); Wed, 1 Dec 2021 00:21:56 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4F3041FD58; Wed, 1 Dec 2021 05:18:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335915; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FfzzaplRdzADfh5C67JmZThB8OL3fW3foCobHjbNNyA=; b=SJnT9Yx2k1chf+4wdGTtRjDmkBkH6eNNokiiloYILzt4olwISfkGg8nz2XqdFHzoTIne8T pEsfdI6h91lwBP9ovUVUB1W8dQBhLOBZQ1acbToKVKV6dmJWUyq2Dl2aXfRtm/dnOL1NRS T/lgG/P0BtIc10JQ9gsqPguuePcX6C8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 537D713425; Wed, 1 Dec 2021 05:18:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0FL8CKoFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:34 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 15/17] btrfs: remove stripe boundary calculation for compressed IO Date: Wed, 1 Dec 2021 13:17:54 +0800 Message-Id: <20211201051756.53742-16-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For compressed IO, we calculate the next stripe start inside alloc_compressed_bio(). Since now btrfs_map_bio() can handle bio split, we no longer need to calculate the boundary any more. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 49 +++++------------------------------------- 1 file changed, 5 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 8b4b84b59b0c..70af7d3973b7 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -442,21 +442,15 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, * from or written to. * @endio_func: The endio function to call after the IO for compressed data * is finished. - * @next_stripe_start: Return value of logical bytenr of where next stripe starts. - * Let the caller know to only fill the bio up to the stripe - * boundary. */ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr, - unsigned int opf, bio_end_io_t endio_func, - u64 *next_stripe_start) + unsigned int opf, bio_end_io_t endio_func) { struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - struct btrfs_io_geometry geom; struct extent_map *em; struct bio *bio; - int ret; bio = btrfs_bio_alloc(BIO_MAX_VECS); @@ -473,14 +467,7 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte if (bio_op(bio) == REQ_OP_ZONE_APPEND) bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev); - - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom); free_extent_map(em); - if (ret < 0) { - bio_put(bio); - return ERR_PTR(ret); - } - *next_stripe_start = disk_bytenr + geom.len; return bio; } @@ -506,7 +493,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct bio *bio = NULL; struct compressed_bio *cb; u64 cur_disk_bytenr = disk_start; - u64 next_stripe_start; blk_status_t ret; int skip_sum = inode->flags & BTRFS_INODE_NODATASUM; const bool use_append = btrfs_use_zone_append(inode, disk_start); @@ -539,28 +525,19 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, /* Allocate new bio if submitted or not yet allocated */ if (!bio) { bio = alloc_compressed_bio(cb, cur_disk_bytenr, - bio_op | write_flags, end_compressed_bio_write, - &next_stripe_start); + bio_op | write_flags, end_compressed_bio_write); if (IS_ERR(bio)) { ret = errno_to_blk_status(PTR_ERR(bio)); bio = NULL; goto finish_cb; } } - /* - * We should never reach next_stripe_start start as we will - * submit comp_bio when reach the boundary immediately. - */ - ASSERT(cur_disk_bytenr != next_stripe_start); - /* * We have various limits on the real read size: - * - stripe boundary * - page boundary * - compressed length boundary */ - real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_bytenr); - real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset)); real_size = min_t(u64, real_size, compressed_len - offset); ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); @@ -575,9 +552,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, submit = true; cur_disk_bytenr += added; - /* Reached stripe boundary */ - if (cur_disk_bytenr == next_stripe_start) - submit = true; /* Finished the range */ if (cur_disk_bytenr == disk_start + compressed_len) @@ -797,7 +771,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, struct bio *comp_bio = NULL; const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT; u64 cur_disk_byte = disk_bytenr; - u64 next_stripe_start; u64 file_offset; u64 em_len; u64 em_start; @@ -878,27 +851,19 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* Allocate new bio if submitted or not yet allocated */ if (!comp_bio) { comp_bio = alloc_compressed_bio(cb, cur_disk_byte, - REQ_OP_READ, end_compressed_bio_read, - &next_stripe_start); + REQ_OP_READ, end_compressed_bio_read); if (IS_ERR(comp_bio)) { ret = errno_to_blk_status(PTR_ERR(comp_bio)); comp_bio = NULL; goto finish_cb; } } - /* - * We should never reach next_stripe_start start as we will - * submit comp_bio when reach the boundary immediately. - */ - ASSERT(cur_disk_byte != next_stripe_start); /* * We have various limit on the real read size: - * - stripe boundary * - page boundary * - compressed length boundary */ - real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_byte); - real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset)); real_size = min_t(u64, real_size, compressed_len - offset); ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); @@ -910,10 +875,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, ASSERT(added == real_size); cur_disk_byte += added; - /* Reached stripe boundary, need to submit */ - if (cur_disk_byte == next_stripe_start) - submit = true; - /* Has finished the range, need to submit */ if (cur_disk_byte == disk_bytenr + compressed_len) submit = true; From patchwork Wed Dec 1 05:17:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F35B3C433EF for ; Wed, 1 Dec 2021 05:18:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346702AbhLAFWA (ORCPT ); Wed, 1 Dec 2021 00:22:00 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:55560 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346723AbhLAFV5 (ORCPT ); Wed, 1 Dec 2021 00:21:57 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A2761212BA; Wed, 1 Dec 2021 05:18:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335916; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OtXkiQjVq07s6cIpFoM/BWXTJJ8eZBowQdaODkesbCA=; b=UAYzHY77kKxpQy7yTmgllzDlZOhni4n7mliVoXL58YVWq2htBqMYn6/gw3g0JXnN6tlgkE 5XQjVVOdCpix7e7MllQt9bNycZ+gKrVoKOWV7uJRt+gqX/t+kMkJe27pfSeByho83GafZv T47ayGhxpQI0aEAXJe6oo5XTBI5+Knw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A773B13425; Wed, 1 Dec 2021 05:18:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WEKEHasFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:35 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 16/17] btrfs: remove the stripe boundary calculation for direct IO Date: Wed, 1 Dec 2021 13:17:55 +0800 Message-Id: <20211201051756.53742-17-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In btrfs_submit_direct() we have a do {} while () loop to handle the bio split due to stripe boundary. Since btrfs_map_bio() can handle it for us now, there is no need to manually do the split anymore. Also since we don't need to split bio, there is no special check for RAID56 anymore, make btrfs_submit_dio_bio() to have the same rule as btrfs_submit_data_bio() for async submit. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 113 ++++++++++------------------------------------- 1 file changed, 24 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 186304c69900..8ffec0fe6c4e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8222,22 +8222,16 @@ static void btrfs_end_dio_bio(struct bio *bio) } static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, - struct inode *inode, u64 file_offset, int async_submit) + struct inode *inode, u64 file_offset) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; + bool async_submit; blk_status_t ret; - /* - * Check btrfs_submit_data_bio() for rules about async submit. - * - * The only exception is for RAID56, when there are more than one bios - * to submit, async submit seems to make it harder to collect csums - * for the full stripe. - */ - if (async_submit) - async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); + /* Check btrfs_submit_data_bio() for rules about async submit. */ + async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); if (!write) btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; @@ -8311,25 +8305,12 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, struct bio *dio_bio, loff_t file_offset) { struct inode *inode = iter->inode; + struct btrfs_dio_data *dio_data = iter->iomap.private; const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - const bool raid56 = (btrfs_data_alloc_profile(fs_info) & - BTRFS_BLOCK_GROUP_RAID56_MASK); struct btrfs_dio_private *dip; struct bio *bio; const u32 length = dio_bio->bi_iter.bi_size; - u32 submitted_bytes = 0; - u64 start_sector; - int async_submit = 0; - u64 submit_len; - u64 clone_offset = 0; - u64 clone_len; - u64 logical; - int ret; blk_status_t status; - struct btrfs_io_geometry geom; - struct btrfs_dio_data *dio_data = iter->iomap.private; - struct extent_map *em = NULL; dip = btrfs_create_dio_private(dio_bio, inode, file_offset, length); if (!dip) { @@ -8353,80 +8334,34 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, goto out_err; } - start_sector = dio_bio->bi_iter.bi_sector; - submit_len = dio_bio->bi_iter.bi_size; - - do { - logical = start_sector << 9; - em = btrfs_get_chunk_map(fs_info, logical, submit_len); - if (IS_ERR(em)) { - status = errno_to_blk_status(PTR_ERR(em)); - em = NULL; - goto out_err_em; - } - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), - logical, &geom); - if (ret) { - status = errno_to_blk_status(ret); - goto out_err_em; - } - - clone_len = min(submit_len, geom.len); - ASSERT(clone_len <= UINT_MAX); - - /* - * This will never fail as it's passing GPF_NOFS and - * the allocation is backed by btrfs_bioset. - */ - bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); - bio->bi_private = dip; - bio->bi_end_io = btrfs_end_dio_bio; - - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - status = extract_ordered_extent(BTRFS_I(inode), bio, - file_offset); - if (status) { - bio_put(bio); - goto out_err; - } - } - - ASSERT(submit_len >= clone_len); - submit_len -= clone_len; + /* + * This will never fail as it's passing GPF_NOFS and + * the allocation is backed by btrfs_bioset. + */ + bio = btrfs_bio_clone(dio_bio); + bio->bi_private = dip; + bio->bi_end_io = btrfs_end_dio_bio; - if (submit_len > 0) { - /* - * If we are submitting more than one bio, submit them - * all asynchronously. The exception is RAID 5 or 6, as - * asynchronous checksums make it difficult to collect - * full stripe writes. - */ - if (!raid56) - async_submit = 1; - } - status = btrfs_submit_dio_bio(bio, inode, file_offset, - async_submit); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + status = extract_ordered_extent(BTRFS_I(inode), bio, + file_offset); if (status) { bio_put(bio); - goto out_err_em; + goto out_err; } - - submitted_bytes += clone_len; - dio_data->submitted += clone_len; - clone_offset += clone_len; - start_sector += clone_len >> 9; - file_offset += clone_len; - - free_extent_map(em); - } while (submit_len > 0); + } + status = btrfs_submit_dio_bio(bio, inode, file_offset); + if (status) { + bio_put(bio); + goto out_err; + } + dio_data->submitted += length; return; -out_err_em: - free_extent_map(em); out_err: dip->dio_bio->bi_status = status; - dio_private_finish(dip, status, length - submitted_bytes); + dio_private_finish(dip, status, length); } const struct iomap_ops btrfs_dio_iomap_ops = { From patchwork Wed Dec 1 05:17:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12649085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58D4CC4332F for ; Wed, 1 Dec 2021 05:18:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346740AbhLAFWB (ORCPT ); Wed, 1 Dec 2021 00:22:01 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37900 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346730AbhLAFV7 (ORCPT ); Wed, 1 Dec 2021 00:21:59 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 018751FD2F; Wed, 1 Dec 2021 05:18:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638335918; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rf1NxhU4c5/yOnHNwAT9Uf2d3d2tjhJ4/y9hfGL/Jmk=; b=HJnaKyC6FUlOU0JySfWD3NQ16wzF4SGgbJfBgNDC0U0S3x7VDo+Zp8Ehqv7PHbOdur4pez dsp621Y2AtSGK+JHwEqF8YralPCwsGFATK+OnntfCNhHmvY8Rug4e2+9nXm8qlHWvPQmH/ 175DmgTOvi7Mw9pgLvq3zYpnBoIoPLw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 06AF813425; Wed, 1 Dec 2021 05:18:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id eAPfMawFp2EGbwAAMHmgww (envelope-from ); Wed, 01 Dec 2021 05:18:36 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH 17/17] btrfs: unexport btrfs_get_io_geometry() Date: Wed, 1 Dec 2021 13:17:56 +0800 Message-Id: <20211201051756.53742-18-wqu@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211201051756.53742-1-wqu@suse.com> References: <20211201051756.53742-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This function provides a lighter weight version of btrfs_map_block(), just to provide enough info without filling everything of btrfs_map_block(). But that function is only used for stripe boundary calculation, and now stripe boundary calculation is all handled inside btrfs_map_bio(), there is no need to export it anymore. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 3 --- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 35ba1ea95295..89d14531c2b3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6320,9 +6320,9 @@ static bool need_full_stripe(enum btrfs_map_op op) * Returns < 0 in case a chunk for the given logical address cannot be found, * usually shouldn't happen unless @logical is corrupted, 0 otherwise. */ -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em, - enum btrfs_map_op op, u64 logical, - struct btrfs_io_geometry *io_geom) +static int get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em, + enum btrfs_map_op op, u64 logical, + struct btrfs_io_geometry *io_geom) { struct map_lookup *map; u64 len; @@ -6434,7 +6434,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, em = btrfs_get_chunk_map(fs_info, logical, *length); ASSERT(!IS_ERR(em)); - ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom); + ret = get_io_geometry(fs_info, em, op, logical, &geom); if (ret < 0) return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b2dbf895b4e3..b37ee1e16d5b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -562,9 +562,6 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, u64 logical, u64 *length, struct btrfs_io_context **bioc_ret); -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map, - enum btrfs_map_op op, u64 logical, - struct btrfs_io_geometry *io_geom); int btrfs_read_sys_array(struct btrfs_fs_info *fs_info); int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info); struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,