diff mbox

INFO: task hung in blk_queue_enter

Message ID b9c33b04-08d7-87a1-3d93-d81a84e6af12@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Tetsuo Handa May 16, 2018, 1:05 p.m. UTC
Tetsuo Handa wrote:
> I couldn't check whether freeze_depth in blk_freeze_queue_start() was 1,
> but presumably q->mq_freeze_depth > 0 because syz-executor7(PID=5010) is
> stuck at wait_event() in blk_queue_enter().
> 
> Since flags == 0, preempt == false. Since stuck at wait_event(), success == false.
> Thus, atomic_read(&q->mq_freeze_depth) > 0 if blk_queue_dying(q) == false. And I
> guess blk_queue_dying(q) == false because we are just trying to freeze/unfreeze.
> 

I was able to reproduce the hung up using modified reproducer, and got values
using below debug printk() patch.

  --- a/block/blk-core.c
  +++ b/block/blk-core.c
  @@ -950,10 +950,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
   		 */
   		smp_rmb();
   
  -		wait_event(q->mq_freeze_wq,
  -			   (atomic_read(&q->mq_freeze_depth) == 0 &&
  -			    (preempt || !blk_queue_preempt_only(q))) ||
  -			   blk_queue_dying(q));
  +		while (wait_event_timeout(q->mq_freeze_wq,
  +					  (atomic_read(&q->mq_freeze_depth) == 0 &&
  +					   (preempt || !blk_queue_preempt_only(q))) ||
  +					  blk_queue_dying(q), 10 * HZ) == 0)
  +			printk("%s(%u): q->mq_freeze_depth=%d preempt=%d blk_queue_preempt_only(q)=%d blk_queue_dying(q)=%d\n",
  +			       current->comm, current->pid, atomic_read(&q->mq_freeze_depth), preempt, blk_queue_preempt_only(q), blk_queue_dying(q));
   		if (blk_queue_dying(q))
   			return -ENODEV;
   	}

[   75.869126] print_req_error: I/O error, dev loop0, sector 0
[   85.983146] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0
[   96.222884] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0
[  106.463322] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0
[  116.702912] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0

One ore more threads are waiting for q->mq_freeze_depth to become 0. But the
thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from
blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore,
atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in
blk_queue_enter() will never be satisfied. But what does that wait_event()
want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true?
Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is
about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0
as if blk_queue_dying(q) == true? That is, something like below:

Comments

Bart Van Assche May 16, 2018, 2:56 p.m. UTC | #1
On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote:
> One ore more threads are waiting for q->mq_freeze_depth to become 0. But the

> thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from

> blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore,

> atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in

> blk_queue_enter() will never be satisfied. But what does that wait_event()

> want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true?

> Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is

> about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0

> as if blk_queue_dying(q) == true? That is, something like below:

> 

> diff --git a/block/blk-core.c b/block/blk-core.c

> index 85909b4..59e2496 100644

> --- a/block/blk-core.c

> +++ b/block/blk-core.c

> @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)

>  		smp_rmb();

>  

>  		wait_event(q->mq_freeze_wq,

> -			   (atomic_read(&q->mq_freeze_depth) == 0 &&

> -			    (preempt || !blk_queue_preempt_only(q))) ||

> +			   atomic_read(&q->mq_freeze_depth) ||

> +			   (preempt || !blk_queue_preempt_only(q)) ||

>  			   blk_queue_dying(q));

> -		if (blk_queue_dying(q))

> +		if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q))

>  			return -ENODEV;

>  	}

>  }


That change looks wrong to me. Additionally, I think that you are looking in
the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter()
work fine for all block drivers except the loop driver I think that you should
have a closer look at how the loop driver uses this block layer functionality.

Thanks,

Bart.
Dmitry Vyukov May 16, 2018, 3:16 p.m. UTC | #2
On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote:
>> One ore more threads are waiting for q->mq_freeze_depth to become 0. But the
>> thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from
>> blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore,
>> atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in
>> blk_queue_enter() will never be satisfied. But what does that wait_event()
>> want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true?
>> Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is
>> about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0
>> as if blk_queue_dying(q) == true? That is, something like below:
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 85909b4..59e2496 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>>               smp_rmb();
>>
>>               wait_event(q->mq_freeze_wq,
>> -                        (atomic_read(&q->mq_freeze_depth) == 0 &&
>> -                         (preempt || !blk_queue_preempt_only(q))) ||
>> +                        atomic_read(&q->mq_freeze_depth) ||
>> +                        (preempt || !blk_queue_preempt_only(q)) ||
>>                          blk_queue_dying(q));
>> -             if (blk_queue_dying(q))
>> +             if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q))
>>                       return -ENODEV;
>>       }
>>  }
>
> That change looks wrong to me.

Hi Bart,

Why does it look wrong to you?

> Additionally, I think that you are looking in
> the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter()
> work fine for all block drivers except the loop driver I think that you should
> have a closer look at how the loop driver uses this block layer functionality.
>
> Thanks,
>
> Bart.
>
>
>
Bart Van Assche May 16, 2018, 3:37 p.m. UTC | #3
On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote:
> On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote:

> > > diff --git a/block/blk-core.c b/block/blk-core.c

> > > index 85909b4..59e2496 100644

> > > --- a/block/blk-core.c

> > > +++ b/block/blk-core.c

> > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)

> > >               smp_rmb();

> > > 

> > >               wait_event(q->mq_freeze_wq,

> > > -                        (atomic_read(&q->mq_freeze_depth) == 0 &&

> > > -                         (preempt || !blk_queue_preempt_only(q))) ||

> > > +                        atomic_read(&q->mq_freeze_depth) ||

> > > +                        (preempt || !blk_queue_preempt_only(q)) ||

> > >                          blk_queue_dying(q));

> > > -             if (blk_queue_dying(q))

> > > +             if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q))

> > >                       return -ENODEV;

> > >       }

> > >  }

> > 

> > That change looks wrong to me.

> 

> Hi Bart,

> 

> Why does it look wrong to you?


Because that change conflicts with the purpose of queue freezing and also because
that change would inject I/O errors in code paths that shouldn't inject I/O errors.
Please have a look at e.g. generic_make_request(). From the start of that function:

	if (blk_queue_enter(q, flags) < 0) {
		if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
			bio_wouldblock_error(bio);
		else
			bio_io_error(bio);
		return ret;
	}

The above patch changes the behavior of blk_queue_enter() code from waiting while
q->mq_freeze_depth != 0 into returning -ENODEV while the request queue is frozen.
That will cause generic_make_request() to call bio_io_error(bio) while a request
queue is frozen if REQ_NOWAIT has not been set, which is the default behavior. So
any operation that freezes the queue temporarily, e.g. changing the queue depth,
concurrently with I/O processing can cause I/O to fail with -ENODEV. As you
probably know failure of write requests has very annoying consequences. It e.g.
causes filesystems to go into read-only mode. That's why I think that the above
change is completely wrong.

Bart.
Alan Jenkins May 16, 2018, 5:33 p.m. UTC | #4
> jbd2/sda1-8(PID=2271) is stuck waiting for journal commit operation.
 > I don't know how this thread is involved to this problem.

It feels like it should be a necessary link in the chain. This is the 
filesystem underneath the loop device. If that hangs, we would expect 
the loop device to hang, but not vice versa.

AIUI, you couldn't reproduce this hang on your own machine.

Don't you think, your fuzzer has just found a nice exploit for reiserfs?

Alan


(It's interesting that you found this hang just after we fixed 
block_queue_enter() to wait in D state instead of S state.[1]  I don't 
know syszkaller, so I assume from this it only flagged this case up 
because of the hung task warning. I.e. syzkaller didn't have its own 
watchdog that would have failed the test anyway.

[1] the commit that caused you to CC me. "block: do not use 
interruptible wait anywhere"

https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428

)
Tetsuo Handa May 21, 2018, 9:52 p.m. UTC | #5
Bart Van Assche wrote:
> On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote:
> > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote:
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 85909b4..59e2496 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> > > >               smp_rmb();
> > > > 
> > > >               wait_event(q->mq_freeze_wq,
> > > > -                        (atomic_read(&q->mq_freeze_depth) == 0 &&
> > > > -                         (preempt || !blk_queue_preempt_only(q))) ||
> > > > +                        atomic_read(&q->mq_freeze_depth) ||
> > > > +                        (preempt || !blk_queue_preempt_only(q)) ||
> > > >                          blk_queue_dying(q));
> > > > -             if (blk_queue_dying(q))
> > > > +             if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q))
> > > >                       return -ENODEV;
> > > >       }
> > > >  }
> > > 
> > > That change looks wrong to me.
> > 
> > Hi Bart,
> > 
> > Why does it look wrong to you?
> 
> Because that change conflicts with the purpose of queue freezing and also because
> that change would inject I/O errors in code paths that shouldn't inject I/O errors.

But waiting there until atomic_read(&q->mq_freeze_depth) becomes 0 is causing deadlock.
wait_event() never returns is a bug. I think we should not wait for q->mq_freeze_depth.
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 85909b4..59e2496 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -951,10 +951,10 @@  int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		smp_rmb();
 
 		wait_event(q->mq_freeze_wq,
-			   (atomic_read(&q->mq_freeze_depth) == 0 &&
-			    (preempt || !blk_queue_preempt_only(q))) ||
+			   atomic_read(&q->mq_freeze_depth) ||
+			   (preempt || !blk_queue_preempt_only(q)) ||
 			   blk_queue_dying(q));
-		if (blk_queue_dying(q))
+		if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q))
 			return -ENODEV;
 	}
 }