diff mbox series

null_blk: fix spurious IO errors after failed past-wp access

Message ID 20200212202320.GA2704@avx2 (mailing list archive)
State New, archived
Headers show
Series null_blk: fix spurious IO errors after failed past-wp access | expand

Commit Message

Alexey Dobriyan Feb. 12, 2020, 8:23 p.m. UTC
Steps to reproduce:

	BLKRESETZONE zone 0

	// force EIO
	pwrite(fd, buf, 4096, 4096);

	[issue more IO including zone ioctls]

It will start failing randomly including IO to unrelated zones because of
->error "reuse". Trigger can be partition detection as well if test is not
run immediately which is even more entertaining.

The fix is of course to clear ->error where necessary.

Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
---

 drivers/block/null_blk_main.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Chaitanya Kulkarni Feb. 15, 2020, 3:20 a.m. UTC | #1
Alexey, thanks for the patch however the description is
not simple to understand.

I just sent a patch with a description and the test result.

On 02/12/2020 12:23 PM, Alexey Dobriyan wrote:
> Steps to reproduce:
>
> 	BLKRESETZONE zone 0
>
> 	// force EIO
> 	pwrite(fd, buf, 4096, 4096);
>
> 	[issue more IO including zone ioctls]
>
> It will start failing randomly including IO to unrelated zones because of
> ->error "reuse". Trigger can be partition detection as well if test is not
> run immediately which is even more entertaining.
>
> The fix is of course to clear ->error where necessary.
>
> Signed-off-by: Alexey Dobriyan (SK hynix)<adobriyan@gmail.com>
> ---
>
>   drivers/block/null_blk_main.c |    2 ++
Christoph Hellwig Feb. 19, 2020, 4:35 p.m. UTC | #2
On Wed, Feb 12, 2020 at 11:23:20PM +0300, Alexey Dobriyan wrote:
> Steps to reproduce:
> 
> 	BLKRESETZONE zone 0
> 
> 	// force EIO
> 	pwrite(fd, buf, 4096, 4096);
> 
> 	[issue more IO including zone ioctls]
> 
> It will start failing randomly including IO to unrelated zones because of
> ->error "reuse". Trigger can be partition detection as well if test is not
> run immediately which is even more entertaining.
> 
> The fix is of course to clear ->error where necessary.
> 
> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
> ---
> 
>  drivers/block/null_blk_main.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> --- a/drivers/block/null_blk_main.c
> +++ b/drivers/block/null_blk_main.c
> @@ -605,6 +605,7 @@ static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
>  	if (tag != -1U) {
>  		cmd = &nq->cmds[tag];
>  		cmd->tag = tag;
> +		cmd->error = BLK_STS_OK;

I'd place this line in null_queue_bio to match the blk-mq patch
more closely.

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Can you add your testcase to blktests?
Jens Axboe March 12, 2020, 3:10 p.m. UTC | #3
On 2/12/20 1:23 PM, Alexey Dobriyan wrote:
> Steps to reproduce:
> 
> 	BLKRESETZONE zone 0
> 
> 	// force EIO
> 	pwrite(fd, buf, 4096, 4096);
> 
> 	[issue more IO including zone ioctls]
> 
> It will start failing randomly including IO to unrelated zones because of
> ->error "reuse". Trigger can be partition detection as well if test is not
> run immediately which is even more entertaining.
> 
> The fix is of course to clear ->error where necessary.

Applied, thanks.
diff mbox series

Patch

--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -605,6 +605,7 @@  static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 	if (tag != -1U) {
 		cmd = &nq->cmds[tag];
 		cmd->tag = tag;
+		cmd->error = BLK_STS_OK;
 		cmd->nq = nq;
 		if (nq->dev->irqmode == NULL_IRQ_TIMER) {
 			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
@@ -1385,6 +1386,7 @@  static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 		cmd->timer.function = null_cmd_timer_expired;
 	}
 	cmd->rq = bd->rq;
+	cmd->error = BLK_STS_OK;
 	cmd->nq = nq;
 
 	blk_mq_start_request(bd->rq);