[PATCHv3,4/4] blk-lib: check for kill signal

Message ID	20240222191922.2130580-5-kbusch@meta.com (mailing list archive)
State	New, archived
Headers	show Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 843CA14B825 for <linux-block@vger.kernel.org>; Thu, 22 Feb 2024 19:20:49 +0000 (UTC) From: Keith Busch <kbusch@meta.com> To: <linux-block@vger.kernel.org> CC: <axboe@kernel.org>, <ming.lei@redhat.com>, <nilay@linux.ibm.com>, <chaitanyak@nvidia.com>, Keith Busch <kbusch@kernel.org>, Conrad Meyer <conradmeyer@meta.com> Subject: [PATCHv3 4/4] blk-lib: check for kill signal Date: Thu, 22 Feb 2024 11:19:22 -0800 Message-ID: <20240222191922.2130580-5-kbusch@meta.com> In-Reply-To: <20240222191922.2130580-1-kbusch@meta.com> References: <20240222191922.2130580-1-kbusch@meta.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	block: make long runnint operations killable \| expand [PATCHv3,0/4] block: make long runnint operations killable [PATCHv3,1/4] block: blkdev_issue_secure_erase loop style [PATCHv3,2/4] block: __blkdev_issue_write_zeroes cleanup [PATCHv3,3/4] block: introduce io wait hang check helper [PATCHv3,4/4] blk-lib: check for kill signal

Message ID

20240222191922.2130580-5-kbusch@meta.com (mailing list archive)

State

New, archived

Headers

From: Keith Busch <kbusch@meta.com>
To: <linux-block@vger.kernel.org>
CC: <axboe@kernel.org>, <ming.lei@redhat.com>, <nilay@linux.ibm.com>,
        <chaitanyak@nvidia.com>, Keith Busch <kbusch@kernel.org>,
        Conrad Meyer
	<conradmeyer@meta.com>
Subject: [PATCHv3 4/4] blk-lib: check for kill signal
Date: Thu, 22 Feb 2024 11:19:22 -0800
Message-ID: <20240222191922.2130580-5-kbusch@meta.com>
In-Reply-To: <20240222191922.2130580-1-kbusch@meta.com>
References: <20240222191922.2130580-1-kbusch@meta.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain

Series

block: make long runnint operations killable | expand

Commit Message

Keith Busch Feb. 22, 2024, 7:19 p.m. UTC

From: Keith Busch <kbusch@kernel.org>

Some of these block operations can access a significant capacity and
take longer than the user expected. A user may change their mind about
wanting to run that command and attempt to kill the process and do
something else with their device. But since the task is uninterruptable,
they have to wait for it to finish, which could be many hours.

Check for a fatal signal at each iteration so the user doesn't have to
wait for their regretted operation to complete naturally.

Reported-by: Conrad Meyer <conradmeyer@meta.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/blk-lib.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Comments

Ming Lei Feb. 23, 2024, 3:38 a.m. UTC | #1

On Fri, Feb 23, 2024 at 3:20 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> Some of these block operations can access a significant capacity and
> take longer than the user expected. A user may change their mind about
> wanting to run that command and attempt to kill the process and do
> something else with their device. But since the task is uninterruptable,
> they have to wait for it to finish, which could be many hours.
>
> Check for a fatal signal at each iteration so the user doesn't have to
> wait for their regretted operation to complete naturally.
>
> Reported-by: Conrad Meyer <conradmeyer@meta.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Christoph Hellwig Feb. 23, 2024, 6:46 a.m. UTC | #2

> +static void abort_bio(struct bio *bio)
> +{
> +	DECLARE_COMPLETION_ONSTACK_MAP(done,
> +			bio->bi_bdev->bd_disk->lockdep_map);
> +
> +	bio->bi_private = &done;
> +	bio->bi_end_io = abort_bio_endio;
> +	bio_endio(bio);
> +	blk_wait_io(&done);
> +}

Without seeing the context below this looks pretty weird, but once
seeing that we're waiting for the previously submitted bios it all
makes sense.  Please write a a good comment explaining this, and
maybe also use a name that is a bit more descriptive.

Nilay Shroff Feb. 23, 2024, 11 a.m. UTC | #3

On 2/23/24 00:49, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Some of these block operations can access a significant capacity and
> take longer than the user expected. A user may change their mind about
> wanting to run that command and attempt to kill the process and do
> something else with their device. But since the task is uninterruptable,
> they have to wait for it to finish, which could be many hours.
> 
> Check for a fatal signal at each iteration so the user doesn't have to
> wait for their regretted operation to complete naturally.
> 
> Reported-by: Conrad Meyer <conradmeyer@meta.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  block/blk-lib.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index d4c476cf3784a..9e594f641ce72 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -35,6 +35,23 @@ static sector_t bio_discard_limit(struct block_device *bdev, sector_t sector)
>  	return round_down(UINT_MAX, discard_granularity) >> SECTOR_SHIFT;
>  }
>  
> +static void abort_bio_endio(struct bio *bio)
> +{
> +	complete(bio->bi_private);
> +	bio_put(bio);
> +}
> +
> +static void abort_bio(struct bio *bio)
> +{
> +	DECLARE_COMPLETION_ONSTACK_MAP(done,
> +			bio->bi_bdev->bd_disk->lockdep_map);
> +
> +	bio->bi_private = &done;
> +	bio->bi_end_io = abort_bio_endio;
> +	bio_endio(bio);
> +	blk_wait_io(&done);
> +}
> +

> @@ -143,6 +164,10 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		nr_sects -= len;
>  		sector += len;
>  		cond_resched();
> +		if (fatal_signal_pending(current)) {
> +			abort_bio(bio);
> +			return -EINTR;
> +		}
>  	}
>  
>  	*biop = bio;
> @@ -187,6 +212,10 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>  				break;
>  		}
>  		cond_resched();
> +		if (fatal_signal_pending(current)) {
> +			abort_bio(bio);
> +			return -EINTR;
> +		}
>  	}
>  

If a device with large capacity supports write zero offload and user kills that
long outstanding write zero operation then it appears we run through the fatal_signal_pending()
and abort_bio() twice: once under __blkdev_issue_write_zeroes() and then latter under 
__blkdev_issue_zero_pages(). The entry to __blkdev_issue_zero_pages() happens if 
__blkdev_issue_write_zeroes() returns the error code and BLKDEV_ZERO_NOFALLBACK is NOT 
specified in flags.

I think if fatal signal is intercepted while running __blkdev_issue_write_zeroes() then we 
shouldn't need to re-enter the __blkdev_issue_zero_pages(). We may want to add following code:

@@ -280,7 +306,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
                bio_put(bio);
        }
        blk_finish_plug(&plug);
-       if (ret && try_write_zeroes) {
+       if (ret && ret != -EINTR && try_write_zeroes) {
                if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
                        try_write_zeroes = false;
                        goto retry;

Thanks,
--Nilay

Keith Busch Feb. 23, 2024, 3:58 p.m. UTC | #4

On Fri, Feb 23, 2024 at 04:30:14PM +0530, Nilay Shroff wrote:
> I think if fatal signal is intercepted while running __blkdev_issue_write_zeroes() then we 
> shouldn't need to re-enter the __blkdev_issue_zero_pages(). We may want to add following code:
> 
> @@ -280,7 +306,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>                 bio_put(bio);
>         }
>         blk_finish_plug(&plug);
> -       if (ret && try_write_zeroes) {
> +       if (ret && ret != -EINTR && try_write_zeroes) {
>                 if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
>                         try_write_zeroes = false;
>                         goto retry;

Good point, I'll fold this in.

diff --git a/block/blk-lib.c b/block/blk-lib.c
index d4c476cf3784a..9e594f641ce72 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -35,6 +35,23 @@  static sector_t bio_discard_limit(struct block_device *bdev, sector_t sector)
 	return round_down(UINT_MAX, discard_granularity) >> SECTOR_SHIFT;
 }
 
+static void abort_bio_endio(struct bio *bio)
+{
+	complete(bio->bi_private);
+	bio_put(bio);
+}
+
+static void abort_bio(struct bio *bio)
+{
+	DECLARE_COMPLETION_ONSTACK_MAP(done,
+			bio->bi_bdev->bd_disk->lockdep_map);
+
+	bio->bi_private = &done;
+	bio->bi_end_io = abort_bio_endio;
+	bio_endio(bio);
+	blk_wait_io(&done);
+}
+
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
 {
@@ -77,6 +94,10 @@  int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		 * is disabled.
 		 */
 		cond_resched();
+		if (fatal_signal_pending(current)) {
+			abort_bio(bio);
+			return -EINTR;
+		}
 	}
 
 	*biop = bio;
@@ -143,6 +164,10 @@  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		nr_sects -= len;
 		sector += len;
 		cond_resched();
+		if (fatal_signal_pending(current)) {
+			abort_bio(bio);
+			return -EINTR;
+		}
 	}
 
 	*biop = bio;
@@ -187,6 +212,10 @@  static int __blkdev_issue_zero_pages(struct block_device *bdev,
 				break;
 		}
 		cond_resched();
+		if (fatal_signal_pending(current)) {
+			abort_bio(bio);
+			return -EINTR;
+		}
 	}
 
 	*biop = bio;
@@ -329,6 +358,12 @@  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector += len;
 		nr_sects -= len;
 		cond_resched();
+		if (fatal_signal_pending(current)) {
+			abort_bio(bio);
+			ret = -EINTR;
+			bio = NULL;
+			break;
+		}
 	}
 	if (bio) {
 		ret = submit_bio_wait(bio);

[PATCHv3,4/4] blk-lib: check for kill signal

Commit Message

Comments

Patch