diff mbox

[05/10] block: remove per-queue plugging

Message ID BANLkTikBEJa7bJJoLFU7NoiEgOjVHVG08A@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Linus Torvalds April 13, 2011, 2:23 a.m. UTC
On Tue, Apr 12, 2011 at 2:08 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 13 Apr 2011 00:34:52 +1000 Dave Chinner <david@fromorbit.com> wrote:
>>
>> Well, not really - now taking any sleeping lock or waiting on
>> anything can trigger a plug flush where previously you had to
>> explicitly issue them. I'm not saying what we had is better, just
>> that there are implicit flushes with your changes that are
>> inherently uncontrollable...
>
> It's not just sleeping locks - if preempt is enabled a schedule can happen at
> any time - at any depth.  I've seen a spin_unlock do it.

Hmm. I don't think we should flush IO in the preemption path. That
smells wrong on many levels, just one of them being the "any time, any
depth".

It also sounds really wrong from an IO pattern standpoint. The process
is actually still running, and the IO flushing _already_ does the
"only if it's going to sleep" test, but it actually does it _wrong_.
The "current->state" check doesn't make sense for a preemption event,
because it's not actually going to sleep there.

So a patch like the attached (UNTESTED!) sounds like the right thing to do.

Whether it makes any difference for any MD issues, who knows.. But
considering that the unplugging already used to test for "prev->state
!= TASK_RUNNING", this is absolutely the right thing to do - that old
test was just broken.

                                   Linus
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Peter Zijlstra April 13, 2011, 11:12 a.m. UTC | #1
On Tue, 2011-04-12 at 19:23 -0700, Linus Torvalds wrote:
>  kernel/sched.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 48013633d792..a187c3fe027b 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4111,20 +4111,20 @@ need_resched:
>                                         try_to_wake_up_local(to_wakeup);
>                         }
>                         deactivate_task(rq, prev, DEQUEUE_SLEEP);
> +
> +                       /*
> +                        * If we are going to sleep and we have plugged IO queued, make
> +                        * sure to submit it to avoid deadlocks.
> +                        */
> +                       if (blk_needs_flush_plug(prev)) {
> +                               raw_spin_unlock(&rq->lock);
> +                               blk_flush_plug(prev);
> +                               raw_spin_lock(&rq->lock);
> +                       }
>                 }
>                 switch_count = &prev->nvcsw;
>         }
>  
> -       /*
> -        * If we are going to sleep and we have plugged IO queued, make
> -        * sure to submit it to avoid deadlocks.
> -        */
> -       if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) {
> -               raw_spin_unlock(&rq->lock);
> -               blk_flush_plug(prev);
> -               raw_spin_lock(&rq->lock);
> -       }
> -
>         pre_schedule(rq, prev);
>  
>         if (unlikely(!rq->nr_running)) 

Right, that cures the preemption problem. The reason I suggested placing
it where it was is that I'd like to keep all things that release
rq->lock in the middle of schedule() in one place, but I guess we can
cure that with some extra comments.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jens Axboe April 13, 2011, 11:23 a.m. UTC | #2
On 2011-04-13 13:12, Peter Zijlstra wrote:
> On Tue, 2011-04-12 at 19:23 -0700, Linus Torvalds wrote:
>>  kernel/sched.c |   20 ++++++++++----------
>>  1 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 48013633d792..a187c3fe027b 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -4111,20 +4111,20 @@ need_resched:
>>                                         try_to_wake_up_local(to_wakeup);
>>                         }
>>                         deactivate_task(rq, prev, DEQUEUE_SLEEP);
>> +
>> +                       /*
>> +                        * If we are going to sleep and we have plugged IO queued, make
>> +                        * sure to submit it to avoid deadlocks.
>> +                        */
>> +                       if (blk_needs_flush_plug(prev)) {
>> +                               raw_spin_unlock(&rq->lock);
>> +                               blk_flush_plug(prev);
>> +                               raw_spin_lock(&rq->lock);
>> +                       }
>>                 }
>>                 switch_count = &prev->nvcsw;
>>         }
>>  
>> -       /*
>> -        * If we are going to sleep and we have plugged IO queued, make
>> -        * sure to submit it to avoid deadlocks.
>> -        */
>> -       if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) {
>> -               raw_spin_unlock(&rq->lock);
>> -               blk_flush_plug(prev);
>> -               raw_spin_lock(&rq->lock);
>> -       }
>> -
>>         pre_schedule(rq, prev);
>>  
>>         if (unlikely(!rq->nr_running)) 
> 
> Right, that cures the preemption problem. The reason I suggested placing
> it where it was is that I'd like to keep all things that release
> rq->lock in the middle of schedule() in one place, but I guess we can
> cure that with some extra comments.

We definitely only want to do it on going to sleep, not preempt events.
So if you are fine with this change, then lets please do that.

Linus, I've got a few other things queued up in the area, I'll add this
and send them off soon. Or feel free to add this one yourself, since you
already did it.
Peter Zijlstra April 13, 2011, 11:41 a.m. UTC | #3
On Wed, 2011-04-13 at 13:23 +0200, Jens Axboe wrote:
> We definitely only want to do it on going to sleep, not preempt events.
> So if you are fine with this change, then lets please do that.

Here's the Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>, that goes
with it ;-)

> Linus, I've got a few other things queued up in the area, I'll add this
> and send them off soon. Or feel free to add this one yourself, since you
> already did it. 

Right, please send it onwards or have Linus commit it himself and I'll
cook up a patch clarifying the rq->lock'ing mess around there.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Linus Torvalds April 13, 2011, 3:13 p.m. UTC | #4
On Wed, Apr 13, 2011 at 4:23 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
>
> Linus, I've got a few other things queued up in the area, I'll add this
> and send them off soon. Or feel free to add this one yourself, since you
> already did it.

Ok, I committed it with Peter's and your acks.

And if you already put it in your git tree too, git will merge it.

                    Linus

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jens Axboe April 13, 2011, 5:35 p.m. UTC | #5
On 2011-04-13 17:13, Linus Torvalds wrote:
> On Wed, Apr 13, 2011 at 4:23 AM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>
>> Linus, I've got a few other things queued up in the area, I'll add this
>> and send them off soon. Or feel free to add this one yourself, since you
>> already did it.
> 
> Ok, I committed it with Peter's and your acks.

Great, thanks.

> And if you already put it in your git tree too, git will merge it.

I did not, I had a feeling you'd merge this one.
diff mbox

Patch

 kernel/sched.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 48013633d792..a187c3fe027b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4111,20 +4111,20 @@  need_resched:
 					try_to_wake_up_local(to_wakeup);
 			}
 			deactivate_task(rq, prev, DEQUEUE_SLEEP);
+
+			/*
+			 * If we are going to sleep and we have plugged IO queued, make
+			 * sure to submit it to avoid deadlocks.
+			 */
+			if (blk_needs_flush_plug(prev)) {
+				raw_spin_unlock(&rq->lock);
+				blk_flush_plug(prev);
+				raw_spin_lock(&rq->lock);
+			}
 		}
 		switch_count = &prev->nvcsw;
 	}
 
-	/*
-	 * If we are going to sleep and we have plugged IO queued, make
-	 * sure to submit it to avoid deadlocks.
-	 */
-	if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) {
-		raw_spin_unlock(&rq->lock);
-		blk_flush_plug(prev);
-		raw_spin_lock(&rq->lock);
-	}
-
 	pre_schedule(rq, prev);
 
 	if (unlikely(!rq->nr_running))