From patchwork Fri Jun 24 14:12:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12894578 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 646C7C433EF for ; Fri, 24 Jun 2022 14:14:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656080082; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=+SHmk9VLfMPUa9L4adTWW4NASrcGX22EYdjYjAbKNKc=; b=K+JZFQuNzkI9869IUW4o5Ux9UNvlSb1tGEJhTojclcuDaUWQ/79uoV3pyYif2iw17KLAkZ jqEy+luLvq52MN+buX0BOqYNipHGfnDs8mqPyawnB2f+ENlZLySHB2jkfo2MkBF1fmuzd6 0Xv5v9Dx4ulrfZoP0/PBrhZ3Akhqq/U= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-362-jfRwm6FBOUmvOEaeTgvmBg-1; Fri, 24 Jun 2022 10:14:40 -0400 X-MC-Unique: jfRwm6FBOUmvOEaeTgvmBg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 902F51C0513E; Fri, 24 Jun 2022 14:14:37 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7FB4A141510C; Fri, 24 Jun 2022 14:14:37 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 46CA4194B94C; Fri, 24 Jun 2022 14:14:37 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id AA545194B946 for ; Fri, 24 Jun 2022 14:14:35 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 9E3E61415112; Fri, 24 Jun 2022 14:14:35 +0000 (UTC) Received: from localhost (ovpn-8-21.pek2.redhat.com [10.72.8.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id AEAA21415111; Fri, 24 Jun 2022 14:14:34 +0000 (UTC) From: Ming Lei To: Jens Axboe , Mike Snitzer Date: Fri, 24 Jun 2022 22:12:52 +0800 Message-Id: <20220624141255.2461148-2-ming.lei@redhat.com> In-Reply-To: <20220624141255.2461148-1-ming.lei@redhat.com> References: <20220624141255.2461148-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Subject: [dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Martin K . Petersen" , Eric Biggers , Ming Lei , linux-block@vger.kernel.org, dm-devel@redhat.com, Dmitry Monakhov , Kent Overstreet Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Commit 7759eb23fd98 ("block: remove bio_rewind_iter()") removes the similar API because the following reasons: ``` It is pointed that bio_rewind_iter() is one very bad API[1]: 1) bio size may not be restored after rewinding 2) it causes some bogus change, such as 5151842b9d8732 (block: reset bi_iter.bi_done after splitting bio) 3) rewinding really makes things complicated wrt. bio splitting 4) unnecessary updating of .bi_done in fast path [1] https://marc.info/?t=153549924200005&r=1&w=2 So this patch takes Kent's suggestion to restore one bio into its original state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(), given now bio_rewind_iter() is only used by bio integrity code. ``` However, it isn't easy to restore bio by saving 32 bytes bio->bi_iter, and saving it only can't restore crypto and integrity info. Add bio_rewind() back for some use cases which may not be same with previous generic case: 1) most of bio has fixed end sector since bio split is done from front of the bio, if driver just records how many sectors between current bio's start sector and the bio's end sector, the original position can be restored 2) if one bio's end sector won't change, usually bio_trim() isn't called, user can restore original position by storing sectors from current ->bi_iter.bi_sector to bio's end sector; together by saving bio size, 8 bytes can restore to original bio. 3) dm's requeue use case: when BLK_STS_DM_REQUEUE happens, dm core needs to restore to the original bio which represents current dm io to be requeued. By storing sectors to the bio's end sector and dm io's size, bio_rewind() can restore such original bio, then dm core code needn't to allocate one bio beforehand just for handling BLK_STS_DM_REQUEUE which is actually one unusual event. 4) Not like original rewind API, this one needn't to add .bi_done, and no any effect on fast path Cc: Eric Biggers Cc: Kent Overstreet Cc: Dmitry Monakhov Cc: Martin K. Petersen Signed-off-by: Ming Lei Signed-off-by: Kent Overstreet --- block/bio-integrity.c | 19 +++++++++++++++++++ block/bio.c | 19 +++++++++++++++++++ block/blk-crypto-internal.h | 7 +++++++ block/blk-crypto.c | 23 +++++++++++++++++++++++ include/linux/bio.h | 21 +++++++++++++++++++++ include/linux/bvec.h | 33 +++++++++++++++++++++++++++++++++ 6 files changed, 122 insertions(+) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 32929c89ba8a..06c2fe81fdf2 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -378,6 +378,25 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done) bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes); } +/** + * bio_integrity_rewind - Rewind integrity vector + * @bio: bio whose integrity vector to update + * @bytes_done: number of data bytes to rewind + * + * Description: This function calculates how many integrity bytes the + * number of completed data bytes correspond to and rewind the + * integrity vector accordingly. + */ +void bio_integrity_rewind(struct bio *bio, unsigned int bytes_done) +{ + struct bio_integrity_payload *bip = bio_integrity(bio); + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); + unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9); + + bip->bip_iter.bi_sector -= bio_integrity_intervals(bi, bytes_done >> 9); + bvec_iter_rewind(bip->bip_vec, &bip->bip_iter, bytes); +} + /** * bio_integrity_trim - Trim integrity vector * @bio: bio whose integrity vector to update diff --git a/block/bio.c b/block/bio.c index 51c99f2c5c90..5318944b7b18 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1360,6 +1360,25 @@ void __bio_advance(struct bio *bio, unsigned bytes) } EXPORT_SYMBOL(__bio_advance); +/** + * bio_rewind - rewind @bio by @bytes + * @bio: bio to rewind + * @bytes: how many bytes to rewind + * + * Update ->bi_iter of @bio by rewinding @bytes. Most of bio has fixed end + * sector, so it is easy to rewind from end of the bio and restore its + * original position. And it is caller's responsibility to restore bio size. + */ +void bio_rewind(struct bio *bio, unsigned bytes) +{ + if (bio_integrity(bio)) + bio_integrity_rewind(bio, bytes); + + bio_crypt_rewind(bio, bytes); + bio_rewind_iter(bio, &bio->bi_iter, bytes); +} +EXPORT_SYMBOL(bio_rewind); + void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter, struct bio *src, struct bvec_iter *src_iter) { diff --git a/block/blk-crypto-internal.h b/block/blk-crypto-internal.h index e6818ffaddbf..b723599bbf99 100644 --- a/block/blk-crypto-internal.h +++ b/block/blk-crypto-internal.h @@ -114,6 +114,13 @@ static inline void bio_crypt_advance(struct bio *bio, unsigned int bytes) __bio_crypt_advance(bio, bytes); } +void __bio_crypt_rewind(struct bio *bio, unsigned int bytes); +static inline void bio_crypt_rewind(struct bio *bio, unsigned int bytes) +{ + if (bio_has_crypt_ctx(bio)) + __bio_crypt_rewind(bio, bytes); +} + void __bio_crypt_free_ctx(struct bio *bio); static inline void bio_crypt_free_ctx(struct bio *bio) { diff --git a/block/blk-crypto.c b/block/blk-crypto.c index a496aaef85ba..caae2f429fc7 100644 --- a/block/blk-crypto.c +++ b/block/blk-crypto.c @@ -134,6 +134,21 @@ void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], } } +/* Decrements @dun by @dec, treating @dun as a multi-limb integer. */ +void bio_crypt_dun_decrement(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], + unsigned int dec) +{ + int i; + + for (i = 0; dec && i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) { + dun[i] -= dec; + if (dun[i] > inc) + dec = 1; + else + dec = 0; + } +} + void __bio_crypt_advance(struct bio *bio, unsigned int bytes) { struct bio_crypt_ctx *bc = bio->bi_crypt_context; @@ -142,6 +157,14 @@ void __bio_crypt_advance(struct bio *bio, unsigned int bytes) bytes >> bc->bc_key->data_unit_size_bits); } +void __bio_crypt_rewind(struct bio *bio, unsigned int bytes) +{ + struct bio_crypt_ctx *bc = bio->bi_crypt_context; + + bio_crypt_dun_decrement(bc->bc_dun, + bytes >> bc->bc_key->data_unit_size_bits); +} + /* * Returns true if @bc->bc_dun plus @bytes converted to data units is equal to * @next_dun, treating the DUNs as multi-limb integers. diff --git a/include/linux/bio.h b/include/linux/bio.h index 992ee987f273..4e6674f232b4 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -105,6 +105,19 @@ static inline void bio_advance_iter(const struct bio *bio, /* TODO: It is reasonable to complete bio with error here. */ } +static inline void bio_rewind_iter(const struct bio *bio, + struct bvec_iter *iter, unsigned int bytes) +{ + iter->bi_sector -= bytes >> 9; + + /* No advance means no rewind */ + if (bio_no_advance_iter(bio)) + iter->bi_size += bytes; + else + bvec_iter_rewind(bio->bi_io_vec, iter, bytes); + /* TODO: It is reasonable to complete bio with error here. */ +} + /* @bytes should be less or equal to bvec[i->bi_idx].bv_len */ static inline void bio_advance_iter_single(const struct bio *bio, struct bvec_iter *iter, @@ -119,6 +132,7 @@ static inline void bio_advance_iter_single(const struct bio *bio, } void __bio_advance(struct bio *, unsigned bytes); +void bio_rewind(struct bio *, unsigned bytes); /** * bio_advance - increment/complete a bio by some number of bytes @@ -699,6 +713,7 @@ extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, un extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int); extern bool bio_integrity_prep(struct bio *); extern void bio_integrity_advance(struct bio *, unsigned int); +extern void bio_integrity_rewind(struct bio *, unsigned int); extern void bio_integrity_trim(struct bio *); extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t); extern int bioset_integrity_create(struct bio_set *, int); @@ -739,6 +754,12 @@ static inline void bio_integrity_advance(struct bio *bio, return; } +static inline void bio_integrity_rewind(struct bio *bio, + unsigned int bytes_done) +{ + return; +} + static inline void bio_integrity_trim(struct bio *bio) { return; diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 35c25dff651a..b56d92e939c1 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -122,6 +122,39 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv, return true; } +static inline bool bvec_iter_rewind(const struct bio_vec *bv, + struct bvec_iter *iter, + unsigned int bytes) +{ + int idx; + + iter->bi_size += bytes; + if (bytes <= iter->bi_bvec_done) { + iter->bi_bvec_done -= bytes; + return true; + } + + bytes -= iter->bi_bvec_done; + idx = iter->bi_idx - 1; + + while (idx >= 0 && bytes && bytes > bv[idx].bv_len) { + bytes -= bv[idx].bv_len; + idx--; + } + + if (WARN_ONCE(idx < 0 && bytes, + "Attempted to rewind iter beyond bvec's boundaries\n")) { + iter->bi_size -= bytes; + iter->bi_bvec_done = 0; + iter->bi_idx = 0; + return false; + } + + iter->bi_idx = idx; + iter->bi_bvec_done = bv[idx].bv_len - bytes; + return true; +} + /* * A simpler version of bvec_iter_advance(), @bytes should not span * across multiple bvec entries, i.e. bytes <= bv[i->bi_idx].bv_len From patchwork Fri Jun 24 14:12:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12894580 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 835BAC433EF for ; Fri, 24 Jun 2022 14:15:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656080146; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=qxUN02pQ43RTvcogTzStJfwrHj9cg6bEdhZvZuo3hS8=; b=Gj++6+zPI2Cro14utCJ7bzFrSGYjCfR5D5no8yK9nIOLsdThTHZcpSVRT+tQyo6eXvQn8Q vAm60fQeDMXjLPhr7CNJKd27P5FP1YNY/6pcuN1/hQsdHpR/5+FjAWoWVPiCTG1s2F6a19 DdcVKe3Ls5dYa0IgQsT6i+PocrLDono= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-93-rX2-91USM56PoiWPGkYlyA-1; Fri, 24 Jun 2022 10:15:20 -0400 X-MC-Unique: rX2-91USM56PoiWPGkYlyA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 68C598032FB; Fri, 24 Jun 2022 14:14:40 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5722A400F8FD; Fri, 24 Jun 2022 14:14:40 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 3F603194B94C; Fri, 24 Jun 2022 14:14:40 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id B7C65194B946 for ; Fri, 24 Jun 2022 14:14:39 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id A93BB1730C; Fri, 24 Jun 2022 14:14:39 +0000 (UTC) Received: from localhost (ovpn-8-21.pek2.redhat.com [10.72.8.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id B250F10725; Fri, 24 Jun 2022 14:14:38 +0000 (UTC) From: Ming Lei To: Jens Axboe , Mike Snitzer Date: Fri, 24 Jun 2022 22:12:53 +0800 Message-Id: <20220624141255.2461148-3-ming.lei@redhat.com> In-Reply-To: <20220624141255.2461148-1-ming.lei@redhat.com> References: <20220624141255.2461148-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 Subject: [dm-devel] [PATCH 5.20 2/4] dm: add new helper for handling dm_io requeue X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, Ming Lei Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Add helper of dm_handle_requeue() for handling dm_io requeue. Signed-off-by: Ming Lei --- drivers/md/dm.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 2b75f1ef7386..a9e5e429c150 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -884,13 +884,11 @@ static int __noflush_suspending(struct mapped_device *md) return test_bit(DMF_NOFLUSH_SUSPENDING, &md->flags); } -static void dm_io_complete(struct dm_io *io) +static void dm_handle_requeue(struct dm_io *io) { - blk_status_t io_error; - struct mapped_device *md = io->md; - struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; - if (io->status == BLK_STS_DM_REQUEUE) { + struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; + struct mapped_device *md = io->md; unsigned long flags; /* * Target requested pushing back the I/O. @@ -909,6 +907,15 @@ static void dm_io_complete(struct dm_io *io) } spin_unlock_irqrestore(&md->deferred_lock, flags); } +} + +static void dm_io_complete(struct dm_io *io) +{ + struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; + struct mapped_device *md = io->md; + blk_status_t io_error; + + dm_handle_requeue(io); io_error = io->status; if (dm_io_flagged(io, DM_IO_ACCOUNTED)) From patchwork Fri Jun 24 14:12:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12894581 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 27D57C43334 for ; Fri, 24 Jun 2022 14:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656080291; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=xlAuXivqrMl66udl0Q1X4FxHYiXyFFQgsSI8lfUZjdM=; b=RznFHhL251t2HvOlslKaKZLnlUofu5ynwL3cgIZ1m4gL/uWsOWaBA3Fr0o+fX7jRfDFylJ dAN0W6GVzLof61AYART2AWw3Pt5AT/I6RD2qWg5zCRNzplOJvKByD5+rONW0BqJ+Rt+P79 2XZsPrUv0pyjZO0AKJqiCMw2BHE+Xjc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-563-xZ3Q6tIeM_OQQzYo_eq_qw-1; Fri, 24 Jun 2022 10:17:08 -0400 X-MC-Unique: xZ3Q6tIeM_OQQzYo_eq_qw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ECC881010361; Fri, 24 Jun 2022 14:14:45 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id DA9A4C08087; Fri, 24 Jun 2022 14:14:45 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 6E360194B94C; Fri, 24 Jun 2022 14:14:45 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 78BDD194B946 for ; Fri, 24 Jun 2022 14:14:43 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 58AF340CFD0B; Fri, 24 Jun 2022 14:14:43 +0000 (UTC) Received: from localhost (ovpn-8-21.pek2.redhat.com [10.72.8.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A99540CFD0A; Fri, 24 Jun 2022 14:14:41 +0000 (UTC) From: Ming Lei To: Jens Axboe , Mike Snitzer Date: Fri, 24 Jun 2022 22:12:54 +0800 Message-Id: <20220624141255.2461148-4-ming.lei@redhat.com> In-Reply-To: <20220624141255.2461148-1-ming.lei@redhat.com> References: <20220624141255.2461148-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 Subject: [dm-devel] [PATCH 5.20 3/4] dm: improve handling for DM_REQUEUE and AGAIN X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, Ming Lei Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com In case that BLK_STS_DM_REQUEUE is returned or BLK_STS_AGAIN is returned for POLLED io, we requeue the original bio into deferred list and request md->wq to re-submit it to block layer. Improve the handling in the following way: 1) unify handling for BLK_STS_DM_REQUEUE and BLK_STS_AGAIN, and clear REQ_POLLED for BLK_STS_DM_REQUEUE too, for the sake of simplicity, given BLK_STS_DM_REQUEUE is very unusual 2) queue md->wq explicitly in __dm_io_complete(), so requeue handling becomes more robust Signed-off-by: Ming Lei --- drivers/md/dm.c | 58 +++++++++++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 24 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index a9e5e429c150..ee22c763873f 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -884,20 +884,39 @@ static int __noflush_suspending(struct mapped_device *md) return test_bit(DMF_NOFLUSH_SUSPENDING, &md->flags); } -static void dm_handle_requeue(struct dm_io *io) +/* Return true if the original bio is requeued */ +static bool dm_handle_requeue(struct dm_io *io) { - if (io->status == BLK_STS_DM_REQUEUE) { - struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; - struct mapped_device *md = io->md; + struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; + bool need_requeue = (io->status == BLK_STS_DM_REQUEUE); + bool handle_eagain = (io->status == BLK_STS_AGAIN) && + (bio->bi_opf & REQ_POLLED); + struct mapped_device *md = io->md; + bool requeued = false; + + if (need_requeue || handle_eagain) { unsigned long flags; + + if (bio->bi_opf & REQ_POLLED) { + /* + * Upper layer won't help us poll split bio + * (io->orig_bio may only reflect a subset of the + * pre-split original) so clear REQ_POLLED in case + * of requeue. + */ + bio_clear_polled(bio); + } + /* * Target requested pushing back the I/O. */ spin_lock_irqsave(&md->deferred_lock, flags); - if (__noflush_suspending(md) && - !WARN_ON_ONCE(dm_is_zone_write(md, bio))) { + if ((__noflush_suspending(md) && + !WARN_ON_ONCE(dm_is_zone_write(md, bio))) || + handle_eagain) { /* NOTE early return due to BLK_STS_DM_REQUEUE below */ bio_list_add_head(&md->deferred, bio); + requeued = true; } else { /* * noflush suspend was interrupted or this is @@ -907,6 +926,10 @@ static void dm_handle_requeue(struct dm_io *io) } spin_unlock_irqrestore(&md->deferred_lock, flags); } + + if (requeued) + queue_work(md->wq, &md->work); + return requeued; } static void dm_io_complete(struct dm_io *io) @@ -914,8 +937,9 @@ static void dm_io_complete(struct dm_io *io) struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; struct mapped_device *md = io->md; blk_status_t io_error; + bool requeued; - dm_handle_requeue(io); + requeued = dm_handle_requeue(io); io_error = io->status; if (dm_io_flagged(io, DM_IO_ACCOUNTED)) @@ -936,23 +960,9 @@ static void dm_io_complete(struct dm_io *io) if (unlikely(wq_has_sleeper(&md->wait))) wake_up(&md->wait); - if (io_error == BLK_STS_DM_REQUEUE || io_error == BLK_STS_AGAIN) { - if (bio->bi_opf & REQ_POLLED) { - /* - * Upper layer won't help us poll split bio (io->orig_bio - * may only reflect a subset of the pre-split original) - * so clear REQ_POLLED in case of requeue. - */ - bio_clear_polled(bio); - if (io_error == BLK_STS_AGAIN) { - /* io_uring doesn't handle BLK_STS_AGAIN (yet) */ - queue_io(md, bio); - return; - } - } - if (io_error == BLK_STS_DM_REQUEUE) - return; - } + /* We have requeued, so return now */ + if (requeued) + return; if (bio_is_flush_with_data(bio)) { /* From patchwork Fri Jun 24 14:12:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12894594 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8EF8BC43334 for ; Fri, 24 Jun 2022 14:18:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656080307; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=C6tUnIN1jRZDOUitBNW3m12bNZnQS7vbfQO5HVkhVkM=; b=agj70WoF4fUBmo0+b6DK9VoBFi2sWokq2GGxl00PefLV/QAYJBRsqZ15/xmrrcqxVJOfTf m6IAdQq0r9PoB0USpytuqMGodzoQf7QojMSjHkQdEA2ZUG4UpAM8Jd2bntVkzG4ab/Cysx k0RNqHdriCk9iXeTLdkp0g9BDVTLTFY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-623-BldJ9BAXN1SfpZLFH1i1bA-1; Fri, 24 Jun 2022 10:17:48 -0400 X-MC-Unique: BldJ9BAXN1SfpZLFH1i1bA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1D1F43C025C7; Fri, 24 Jun 2022 14:14:49 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B1E2400F8FD; Fri, 24 Jun 2022 14:14:49 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id A0CFB194B94C; Fri, 24 Jun 2022 14:14:48 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 54CCE194B946 for ; Fri, 24 Jun 2022 14:14:47 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 37C632166B29; Fri, 24 Jun 2022 14:14:47 +0000 (UTC) Received: from localhost (ovpn-8-21.pek2.redhat.com [10.72.8.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C30C2166B26; Fri, 24 Jun 2022 14:14:45 +0000 (UTC) From: Ming Lei To: Jens Axboe , Mike Snitzer Date: Fri, 24 Jun 2022 22:12:55 +0800 Message-Id: <20220624141255.2461148-5-ming.lei@redhat.com> In-Reply-To: <20220624141255.2461148-1-ming.lei@redhat.com> References: <20220624141255.2461148-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 Subject: [dm-devel] [PATCH 5.20 4/4] dm: add two stage requeue X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, Ming Lei Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") makes dm io's original bio points to same original bio from upper layer, so more than one dm io can share one same original bio in case of splitting. This way is fine if all dm io is completed successfully. But if BLK_STS_DM_REQUEUE is returned from clone bio, the current code will requeue the shared original bio, and cause the following issue: 1) the shared original bio has been trimmed and mapped to the last dm io, so it becomes not matched if requeuing this original bio 2) more than one dm io completion may touch the single shared original bio, such as the bio may have been submitted in one code path, but another code path may be ending it, so this way is very fragile. Patch 'dm: fix BLK_STS_DM_REQUEUE handling when dm_io' can fix the issue, but still need to clone one backing bio in case of split. We can solve the issue by two stages requeue with help of new added bio_rewind, then the bio clone can only be needed after BLK_STS_DM_REQUEUE happens: 1) requeue the dm io into the added requeue list, and schedule it via new added requeue work, it is just for clone/allocate a mapped original bio for requeue, and we recover the original bio by bio_rewind(). 2) the 2nd stage requeue is same with original requeue, but io->orig_bio points to new cloned bio which matches with the requeued dm io. Signed-off-by: Ming Lei --- drivers/md/dm-core.h | 11 +++- drivers/md/dm.c | 131 ++++++++++++++++++++++++++++++++++--------- 2 files changed, 116 insertions(+), 26 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index c954ff91870e..0545ce441427 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -22,6 +22,8 @@ #define DM_RESERVED_MAX_IOS 1024 +struct dm_io; + struct dm_kobject_holder { struct kobject kobj; struct completion completion; @@ -91,6 +93,14 @@ struct mapped_device { spinlock_t deferred_lock; struct bio_list deferred; + /* + * requeue work context is nedded for cloning one new bio + * for representing the dm_io to be requeued since each + * dm_io may point to the original bio from FS. + */ + struct work_struct requeue_work; + struct dm_io *requeue_list; + void *interface_ptr; /* @@ -272,7 +282,6 @@ struct dm_io { atomic_t io_count; struct mapped_device *md; - struct bio *split_bio; /* The three fields represent mapped part of original bio */ struct bio *orig_bio; unsigned int sector_offset; /* offset to end of orig_bio */ diff --git a/drivers/md/dm.c b/drivers/md/dm.c index ee22c763873f..c2b95b931c31 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -594,7 +594,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) atomic_set(&io->io_count, 2); this_cpu_inc(*md->pending_io); io->orig_bio = bio; - io->split_bio = NULL; io->md = md; spin_lock_init(&io->lock); io->start_time = jiffies; @@ -884,10 +883,32 @@ static int __noflush_suspending(struct mapped_device *md) return test_bit(DMF_NOFLUSH_SUSPENDING, &md->flags); } +static void dm_requeue_add_io(struct dm_io *io, bool first_stage) +{ + struct mapped_device *md = io->md; + + if (first_stage) { + struct dm_io *next = md->requeue_list; + + md->requeue_list = io; + io->next = next; + } else { + bio_list_add_head(&md->deferred, io->orig_bio); + } +} + +static void dm_requeue_schedule(struct mapped_device *md, bool first_stage) +{ + if (first_stage) + queue_work(md->wq, &md->requeue_work); + else + queue_work(md->wq, &md->work); +} + /* Return true if the original bio is requeued */ -static bool dm_handle_requeue(struct dm_io *io) +static bool dm_handle_requeue(struct dm_io *io, bool first_stage) { - struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; + struct bio *bio = io->orig_bio; bool need_requeue = (io->status == BLK_STS_DM_REQUEUE); bool handle_eagain = (io->status == BLK_STS_AGAIN) && (bio->bi_opf & REQ_POLLED); @@ -913,9 +934,9 @@ static bool dm_handle_requeue(struct dm_io *io) spin_lock_irqsave(&md->deferred_lock, flags); if ((__noflush_suspending(md) && !WARN_ON_ONCE(dm_is_zone_write(md, bio))) || - handle_eagain) { + handle_eagain || first_stage) { /* NOTE early return due to BLK_STS_DM_REQUEUE below */ - bio_list_add_head(&md->deferred, bio); + dm_requeue_add_io(io, first_stage); requeued = true; } else { /* @@ -928,18 +949,20 @@ static bool dm_handle_requeue(struct dm_io *io) } if (requeued) - queue_work(md->wq, &md->work); + dm_requeue_schedule(md, first_stage); return requeued; } -static void dm_io_complete(struct dm_io *io) +static void __dm_io_complete(struct dm_io *io, bool first_stage) { - struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; + struct bio *bio = io->orig_bio; struct mapped_device *md = io->md; blk_status_t io_error; bool requeued; - requeued = dm_handle_requeue(io); + requeued = dm_handle_requeue(io, first_stage); + if (requeued && first_stage) + return; io_error = io->status; if (dm_io_flagged(io, DM_IO_ACCOUNTED)) @@ -979,6 +1002,74 @@ static void dm_io_complete(struct dm_io *io) } } +static void dm_wq_requeue_work(struct work_struct *work) +{ + struct mapped_device *md = container_of(work, struct mapped_device, + requeue_work); + unsigned long flags; + struct dm_io *io; + + /* reuse deferred lock to simplify dm_handle_requeue */ + spin_lock_irqsave(&md->deferred_lock, flags); + io = md->requeue_list; + md->requeue_list = NULL; + spin_unlock_irqrestore(&md->deferred_lock, flags); + + while (io) { + struct dm_io *next = io->next; + struct bio *orig = io->orig_bio; + struct bio *new_orig = bio_alloc_clone(orig->bi_bdev, + orig, GFP_NOIO, &md->queue->bio_split); + + /* + * bio_rewind can restore to previous position since the end + * sector is fixed for original bio, but we still need to + * recover sectors manually + */ + bio_rewind(new_orig, (io->sector_offset << 9) - + orig->bi_iter.bi_size); + bio_trim(new_orig, 0, io->sectors); + + bio_chain(new_orig, orig); + /* + * __bi_remaining has been increased during split, so has to + * drop the one added in bio_chain + */ + atomic_dec(&orig->__bi_remaining); + io->orig_bio = new_orig; + + io->next = NULL; + __dm_io_complete(io, false); + io = next; + } +} + +/* + * Two stages requeue: + * + * 1) io->orig_bio points to the real original bio, and we should requeue + * the part mapped to this io, instead of other parts of the original bio + */ +static void dm_io_complete(struct dm_io *io) +{ + bool first_requeue; + + /* + * only split io needs two stage requeue, otherwise we may run + * into bio clone chain in long suspend, and OOM could be triggered, + * pointed out by Mike. + * + * Also flush data io won't be marked as DM_IO_WAS_SPLIT, so they + * needn't to be handled via the first stage requeue. + */ + if (dm_io_flagged(io, DM_IO_WAS_SPLIT)) + first_requeue = true; + else + first_requeue = false; + + __dm_io_complete(io, first_requeue); +} + /* * Decrements the number of outstanding ios that a bio has been * cloned into, completing the original io if necc. @@ -1401,17 +1492,7 @@ static void setup_split_accounting(struct clone_info *ci, unsigned len) */ dm_io_set_flag(io, DM_IO_WAS_SPLIT); io->sectors = len; - } - - if (static_branch_unlikely(&stats_enabled) && - unlikely(dm_stats_used(&io->md->stats))) { - /* - * Save bi_sector in terms of its offset from end of - * original bio, only needed for DM-stats' benefit. - * - saved regardless of whether split needed so that - * dm_accept_partial_bio() doesn't need to. - */ - io->sector_offset = bio_end_sector(ci->bio) - ci->sector; + io->sector_offset = bio_sectors(ci->bio); } } @@ -1711,11 +1792,9 @@ static void dm_split_and_process_bio(struct mapped_device *md, * Remainder must be passed to submit_bio_noacct() so it gets handled * *after* bios already submitted have been completely processed. */ - WARN_ON_ONCE(!dm_io_flagged(io, DM_IO_WAS_SPLIT)); - io->split_bio = bio_split(bio, io->sectors, GFP_NOIO, - &md->queue->bio_split); - bio_chain(io->split_bio, bio); - trace_block_split(io->split_bio, bio->bi_iter.bi_sector); + bio_trim(bio, io->sectors, ci.sector_count); + trace_block_split(bio, bio->bi_iter.bi_sector); + bio_inc_remaining(bio); submit_bio_noacct(bio); out: /* @@ -1991,9 +2070,11 @@ static struct mapped_device *alloc_dev(int minor) init_waitqueue_head(&md->wait); INIT_WORK(&md->work, dm_wq_work); + INIT_WORK(&md->requeue_work, dm_wq_requeue_work); init_waitqueue_head(&md->eventq); init_completion(&md->kobj_holder.completion); + md->requeue_list = NULL; md->swap_bios = get_swap_bios(); sema_init(&md->swap_bios_semaphore, md->swap_bios); mutex_init(&md->swap_bios_lock);