diff mbox

locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

Message ID 20140731131331.GT19379@twins.programming.kicks-ass.net (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Zijlstra July 31, 2014, 1:13 p.m. UTC
On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:

> This didn't make sense to me at first too, and I'll be happy to be
> proven wrong, but we can reproduce this with rbd very reliably under
> higher than usual load, and the revert makes it go away.  What we are
> seeing in the rbd scenario is the following.

This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
there.

> Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
> A and then wants to acquire B, but B is held by bar.  foo spins
> a little and ends up calling schedule_preempt_disabled() on line 484
> above, but that call never returns, even though a hundred usecs later
> bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
> still holds A and everybody else who needs A gets behind A.  Given that
> this A happens to be a central libceph mutex all rbd activity halts.
> Deadlock may not be the best term for this, but never returning from
> mutex_lock(&B) even though B has been unlocked is *a* problem.
> 
> This obviously doesn't happen every time schedule_preempt_disabled() on
> line 484 is called, so there must be some sort of race here.  I'll send
> along the actual rbd stack traces shortly.

Smells like maybe current->state != TASK_RUNNING, does the below
trigger?

If so, you've wrecked something in whatever...

---
 kernel/locking/mutex.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Ingo Molnar July 31, 2014, 1:44 p.m. UTC | #1
* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:
> 
> > This didn't make sense to me at first too, and I'll be happy to be
> > proven wrong, but we can reproduce this with rbd very reliably under
> > higher than usual load, and the revert makes it go away.  What we are
> > seeing in the rbd scenario is the following.
> 
> This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
> there.
> 
> > Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
> > A and then wants to acquire B, but B is held by bar.  foo spins
> > a little and ends up calling schedule_preempt_disabled() on line 484
> > above, but that call never returns, even though a hundred usecs later
> > bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
> > still holds A and everybody else who needs A gets behind A.  Given that
> > this A happens to be a central libceph mutex all rbd activity halts.
> > Deadlock may not be the best term for this, but never returning from
> > mutex_lock(&B) even though B has been unlocked is *a* problem.
> > 
> > This obviously doesn't happen every time schedule_preempt_disabled() on
> > line 484 is called, so there must be some sort of race here.  I'll send
> > along the actual rbd stack traces shortly.
> 
> Smells like maybe current->state != TASK_RUNNING, does the below
> trigger?
> 
> If so, you've wrecked something in whatever...
> 
> ---
>  kernel/locking/mutex.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index ae712b25e492..3d726fdaa764 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	 * reschedule now, before we try-lock the mutex. This avoids getting
>  	 * scheduled out right after we obtained the mutex.
>  	 */
> -	if (need_resched())
> +	if (need_resched()) {
> +		if (WARN_ON_ONCE(current->state != TASK_RUNNING))
> +			__set_current_state(TASK_RUNNING);
> +
>  		schedule_preempt_disabled();
> +	}

Might make sense to add that debug check under mutex debugging or so, 
with a sensible kernel message printed.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Galbraith July 31, 2014, 2:30 p.m. UTC | #2
On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:

> Smells like maybe current->state != TASK_RUNNING

Bingo

[ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
[ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
[ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
[ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
[ 1200.881256] Call Trace:
[ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
[ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
[ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
[ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
[ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
[ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
[ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
[ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
[ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
[ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
[ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
[ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
[ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
[ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
[ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
[ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
[ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
[ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
[ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
[ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
[ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
[ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
[ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
[ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
[ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
[ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov July 31, 2014, 2:37 p.m. UTC | #3
On Thu, Jul 31, 2014 at 6:30 PM, Mike Galbraith
<umgwanakikbuti@gmail.com> wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>
>> Smells like maybe current->state != TASK_RUNNING

It just triggered for me too, took longer than usual.  Sorry for the
churn Peter, this was really confusing.  Onto finding the real bug..

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 31, 2014, 2:39 p.m. UTC | #4
On Thu, Jul 31, 2014 at 04:30:52PM +0200, Mike Galbraith wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
> 
> > Smells like maybe current->state != TASK_RUNNING
> 
> Bingo
> 
> [ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
> [ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
> [ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
> [ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
> [ 1200.881256] Call Trace:
> [ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
> [ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
> [ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
> [ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
> [ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
> [ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
> [ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
> [ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
> [ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
> [ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
> [ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
> [ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
> [ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
> [ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
> [ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
> [ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
> [ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
> [ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
> [ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
> [ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
> [ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
> [ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
> [ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
> [ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
> [ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
> [ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
> [ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
> [ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
> [ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
> [ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
> [ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
> [ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

Ohh. that's properly broken indeed.

You can't just call blocking primitives on the way to schedule(), that's
fail.

Also, if I look at blk_flush_plug_list(), it calls queue_unplugged()
with IRQs disabled, so _who_ is enabling them again and calling blocking
stuff?

/me stares more..

rbd_request_fn() does.. *argh*

Someone needs to go fix, this cannot work right.
Ilya Dryomov Aug. 1, 2014, 12:56 p.m. UTC | #5
On Thu, Jul 31, 2014 at 6:39 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 31, 2014 at 04:30:52PM +0200, Mike Galbraith wrote:
>> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>>
>> > Smells like maybe current->state != TASK_RUNNING
>>
>> Bingo
>>
>> [ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
>> [ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
>> [ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
>> [ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
>> [ 1200.881256] Call Trace:
>> [ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
>> [ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
>> [ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
>> [ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
>> [ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
>> [ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
>> [ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
>> [ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
>> [ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
>> [ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
>> [ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
>> [ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
>> [ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
>> [ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
>> [ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
>> [ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
>> [ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
>> [ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
>> [ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
>> [ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
>> [ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
>> [ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
>> [ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
>> [ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
>> [ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
>> [ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
>> [ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
>> [ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
>> [ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
>> [ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
>> [ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
>> [ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
>> [ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
>> [ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
>
> Ohh. that's properly broken indeed.
>
> You can't just call blocking primitives on the way to schedule(), that's
> fail.
>
> Also, if I look at blk_flush_plug_list(), it calls queue_unplugged()
> with IRQs disabled, so _who_ is enabling them again and calling blocking
> stuff?
>
> /me stares more..
>
> rbd_request_fn() does.. *argh*
>
> Someone needs to go fix, this cannot work right.

I'm going to fix up rbd_request_fn(), but I want to make sure
I understand this in full.

- Previously the danger of calling blocking primitives on the way to
  schedule(), i.e. with task->state != TASK_RUNNING, was that if the
  blocking primitive was indeed to block the task state would be set
  back to TASK_RUNNING and the schedule() that that task was on the way
  to wouldn't have any effect.  Your "Add extra reschedule point" patch
  essentially made calling mutex_lock() and probably others much more
  wrong that it used to be, because mutex_lock() may now reschedule
  when the task is not on the mutex wait queue.

- There is nothing wrong with releasing queue_lock and reenabling IRQs
  in rbd_request_fn() as long as it doesn't block and I remember to
  disable IRQs and take queue_lock back on return.

I'm asking because rbd_request_fn() is probably not the only broken in
this way code path.  I poked around and found read_events() in aio.c,
it seems to have been written with the "danger" assumption that
I outlined above and there is even a comment to it.

Does that above make sense or am I missing something?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Aug. 1, 2014, 1:27 p.m. UTC | #6
On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote:
> I'm going to fix up rbd_request_fn(), but I want to make sure
> I understand this in full.
> 
> - Previously the danger of calling blocking primitives on the way to
>   schedule(), i.e. with task->state != TASK_RUNNING, was that if the
>   blocking primitive was indeed to block the task state would be set
>   back to TASK_RUNNING and the schedule() that that task was on the way
>   to wouldn't have any effect.  Your "Add extra reschedule point" patch
>   essentially made calling mutex_lock() and probably others much more
>   wrong that it used to be, because mutex_lock() may now reschedule
>   when the task is not on the mutex wait queue.

Right and in general we cannot allow spurious wakeups (although we try
very hard to deal with them in generic code, which is why things more or
less worked).

But if you do a patch that 'randomly' ignores ->state on schedule (I did
one) stuff comes apart _real_ quick.

Therefore you should very much not destroy ->state on the way to
schedule.

> - There is nothing wrong with releasing queue_lock and reenabling IRQs
>   in rbd_request_fn() as long as it doesn't block and I remember to
>   disable IRQs and take queue_lock back on return.

Releasing queue_lock might be ok, dunno about the blk locking, however
reenabling IRQs it is actually wrong as per blk_flush_plug_list() since
that uses local_irq_save()/restore() which means it can be called from
contexts which cannot deal with enabling IRQs, and then your
->request_fn() goes and does that.

Now maybe blk_flush_plug_list() is overly paranoid and it could use
local_irq_disable()/enable() instead, I don't know. But until it does, a
request_fn() should never reenable IRQs.

> I'm asking because rbd_request_fn() is probably not the only broken in
> this way code path.  I poked around and found read_events() in aio.c,
> it seems to have been written with the "danger" assumption that
> I outlined above and there is even a comment to it.

I'm fairly sure there's more broken stuff, I didn't dare looking.

> Does that above make sense or am I missing something?

I think that's about it.
Ilya Dryomov Aug. 1, 2014, 1:50 p.m. UTC | #7
On Fri, Aug 1, 2014 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote:
>> I'm going to fix up rbd_request_fn(), but I want to make sure
>> I understand this in full.
>>
>> - Previously the danger of calling blocking primitives on the way to
>>   schedule(), i.e. with task->state != TASK_RUNNING, was that if the
>>   blocking primitive was indeed to block the task state would be set
>>   back to TASK_RUNNING and the schedule() that that task was on the way
>>   to wouldn't have any effect.  Your "Add extra reschedule point" patch
>>   essentially made calling mutex_lock() and probably others much more
>>   wrong that it used to be, because mutex_lock() may now reschedule
>>   when the task is not on the mutex wait queue.
>
> Right and in general we cannot allow spurious wakeups (although we try
> very hard to deal with them in generic code, which is why things more or
> less worked).
>
> But if you do a patch that 'randomly' ignores ->state on schedule (I did
> one) stuff comes apart _real_ quick.
>
> Therefore you should very much not destroy ->state on the way to
> schedule.
>
>> - There is nothing wrong with releasing queue_lock and reenabling IRQs
>>   in rbd_request_fn() as long as it doesn't block and I remember to
>>   disable IRQs and take queue_lock back on return.
>
> Releasing queue_lock might be ok, dunno about the blk locking, however
> reenabling IRQs it is actually wrong as per blk_flush_plug_list() since
> that uses local_irq_save()/restore() which means it can be called from
> contexts which cannot deal with enabling IRQs, and then your
> ->request_fn() goes and does that.
>
> Now maybe blk_flush_plug_list() is overly paranoid and it could use
> local_irq_disable()/enable() instead, I don't know. But until it does, a
> request_fn() should never reenable IRQs.
>
>> I'm asking because rbd_request_fn() is probably not the only broken in
>> this way code path.  I poked around and found read_events() in aio.c,
>> it seems to have been written with the "danger" assumption that
>> I outlined above and there is even a comment to it.
>
> I'm fairly sure there's more broken stuff, I didn't dare looking.
>
>> Does that above make sense or am I missing something?
>
> I think that's about it.

Thanks for clarifying things.  CC'ing Kent to draw attention to
read_events().

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ae712b25e492..3d726fdaa764 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -473,8 +473,12 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 * reschedule now, before we try-lock the mutex. This avoids getting
 	 * scheduled out right after we obtained the mutex.
 	 */
-	if (need_resched())
+	if (need_resched()) {
+		if (WARN_ON_ONCE(current->state != TASK_RUNNING))
+			__set_current_state(TASK_RUNNING);
+
 		schedule_preempt_disabled();
+	}
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);