diff mbox

dm-mpath: requeue I/O during pg_init

Message ID 1380620996-110162-1-git-send-email-hare@suse.de (mailing list archive)
State Awaiting Upstream, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Hannes Reinecke Oct. 1, 2013, 9:49 a.m. UTC
When pg_init is running no I/O can be submitted to the underlying
devices, as the path priority etc might change.
When using queue_io for this requests will be piling up within
multipath as the block I/O scheduler just sees a _very fast_
device.
All of these queued I/O has to be resubmitted from within
multipathing once pg_init is done.

This approach has the problem that it's virtually impossible to
abort I/O when pg_init is running, and we're adding heavy load
to the devices after pg_init as all of these queued I/O need
to be resubmitted _before_ any requests can be pulled of the
request queue and normal operation continues.

This patch will requeue the I/O which has been triggering
the pg_init call, and return 'busy' when pg_init is in progress.
With these changes the blk I/O scheduler will stop submitting
I/O during pg_init, resulting in a quicker path switch and
less I/O pressure (and memory consumption) after pg_init.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-mpath.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Alasdair G Kergon Nov. 5, 2013, 1:02 p.m. UTC | #1
On Tue, Oct 01, 2013 at 11:49:56AM +0200, Hannes Reinecke wrote:
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -390,7 +390,11 @@ static int map_io(struct multipath *m, struct request *clone,
>  	if (was_queued)
>  		m->queue_size--;
>  
> -	if ((pgpath && m->queue_io) ||
> +	if (m->pg_init_required) {
> +		if (!m->pg_init_in_progress)
> +			queue_work(kmultipathd, &m->process_queued_ios);
> +		r = DM_MAPIO_REQUEUE;
> +	} else if ((pgpath && m->queue_io) ||
>  	    (!pgpath && m->queue_if_no_path)) {
>  		/* Queue for the daemon to resubmit */
>  		list_add_tail(&clone->queuelist, &m->queued_ios);

This sequence if...else if... is becoming more and more unreadable!
- Two cases now return REQUEUE; two cases now queue_work().

If it really can't be simplified, could we at least add some explanatory
comments inline?

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke Nov. 5, 2013, 1:10 p.m. UTC | #2
On 11/05/2013 02:02 PM, Alasdair G Kergon wrote:
> On Tue, Oct 01, 2013 at 11:49:56AM +0200, Hannes Reinecke wrote:
>> --- a/drivers/md/dm-mpath.c
>> +++ b/drivers/md/dm-mpath.c
>> @@ -390,7 +390,11 @@ static int map_io(struct multipath *m, struct request *clone,
>>  	if (was_queued)
>>  		m->queue_size--;
>>  
>> -	if ((pgpath && m->queue_io) ||
>> +	if (m->pg_init_required) {
>> +		if (!m->pg_init_in_progress)
>> +			queue_work(kmultipathd, &m->process_queued_ios);
>> +		r = DM_MAPIO_REQUEUE;
>> +	} else if ((pgpath && m->queue_io) ||
>>  	    (!pgpath && m->queue_if_no_path)) {
>>  		/* Queue for the daemon to resubmit */
>>  		list_add_tail(&clone->queuelist, &m->queued_ios);
> 
> This sequence if...else if... is becoming more and more unreadable!
> - Two cases now return REQUEUE; two cases now queue_work().
> 
> If it really can't be simplified, could we at least add some explanatory
> comments inline?
> 
Well, _actually_ this was more an RFC on where would be the point of
having multipath queueing I/Os internally.
This patch remove internal queueing for during pg_init.
I do have a second patch for removing the internal queueing
altogether and drop the entire queue_io stuff.

However, prior to doing so I would like to inquire on the original
design goals. There must have been a reason for implementing the
internal queueing.
If this is just a left-over from the original port to request-based
(for bio-based we _have_ to queue internally as there's no request
queue to be had), fine, we should be removing it.
But the _might_ be some corner cases which require us to do internal
queueing. Junichi should know.

Junichi, could you share some light here?

Cheers,

Hannes
---
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon Nov. 5, 2013, 1:31 p.m. UTC | #3
On Tue, Nov 05, 2013 at 02:10:55PM +0100, Hannes Reinecke wrote:
> If this is just a left-over from the original port to request-based
> (for bio-based we _have_ to queue internally as there's no request
> queue to be had), fine, we should be removing it.

I think that is the case.

> But there _might_ be some corner cases which require us to do internal
> queueing.
 
We *only* add I/O to the internal queue in map_io() - which can always
be replaced with REQUEUE,  As long as we still 'wake up' the queue
immediately when we are ready to receive the I/O, I can't think of any
other reason.  And it would let us remove quite a bit of tricky code!

> Junichi, could you share some light here?
 
Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke Nov. 5, 2013, 1:45 p.m. UTC | #4
On 11/05/2013 02:31 PM, Alasdair G Kergon wrote:
> On Tue, Nov 05, 2013 at 02:10:55PM +0100, Hannes Reinecke wrote:
>> If this is just a left-over from the original port to request-based
>> (for bio-based we _have_ to queue internally as there's no request
>> queue to be had), fine, we should be removing it.
> 
> I think that is the case.
> 
>> But there _might_ be some corner cases which require us to do internal
>> queueing.
>  
> We *only* add I/O to the internal queue in map_io() - which can always
> be replaced with REQUEUE,  As long as we still 'wake up' the queue
> immediately when we are ready to receive the I/O, I can't think of any
> other reason.  And it would let us remove quite a bit of tricky code!
> 
Precisely what I was thinking.

I'll be cobbling together a patch.

Cheers,

Hannes
Alasdair G Kergon Nov. 5, 2013, 3:35 p.m. UTC | #5
On Tue, Nov 05, 2013 at 02:10:55PM +0100, Hannes Reinecke wrote:
> On 11/05/2013 02:02 PM, Alasdair G Kergon wrote:
> > On Tue, Oct 01, 2013 at 11:49:56AM +0200, Hannes Reinecke wrote:
> >> --- a/drivers/md/dm-mpath.c
> >> +++ b/drivers/md/dm-mpath.c
> >> @@ -390,7 +390,11 @@ static int map_io(struct multipath *m, struct request *clone,
> >>  	if (was_queued)
> >>  		m->queue_size--;
> >>  
> >> -	if ((pgpath && m->queue_io) ||
> >> +	if (m->pg_init_required) {
> >> +		if (!m->pg_init_in_progress)
> >> +			queue_work(kmultipathd, &m->process_queued_ios);
> >> +		r = DM_MAPIO_REQUEUE;
> >> +	} else if ((pgpath && m->queue_io) ||
> >>  	    (!pgpath && m->queue_if_no_path)) {
> >>  		/* Queue for the daemon to resubmit */
> >>  		list_add_tail(&clone->queuelist, &m->queued_ios);
> > 
> > This sequence if...else if... is becoming more and more unreadable!
> > - Two cases now return REQUEUE; two cases now queue_work().
> > 
> > If it really can't be simplified, could we at least add some explanatory
> > comments inline?

For starters, this case is now handled by the new if test so can no longer trigger:

@@ -400,8 +400,7 @@ static int map_io(struct multipath *m, struct request *clone,
                /* Queue for the daemon to resubmit */
                list_add_tail(&clone->queuelist, &m->queued_ios);
                m->queue_size++;
-               if ((m->pg_init_required && !m->pg_init_in_progress) ||
-                   !m->queue_io)
+               if (!m->queue_io)
                        queue_work(kmultipathd, &m->process_queued_ios);
                pgpath = NULL;
                r = DM_MAPIO_SUBMITTED;

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Junichi Nomura Nov. 6, 2013, 1:28 a.m. UTC | #6
On 11/05/13 22:45, Hannes Reinecke wrote:
> On 11/05/2013 02:31 PM, Alasdair G Kergon wrote:
>> On Tue, Nov 05, 2013 at 02:10:55PM +0100, Hannes Reinecke wrote:
>>> If this is just a left-over from the original port to request-based
>>> (for bio-based we _have_ to queue internally as there's no request
>>> queue to be had), fine, we should be removing it.
>>
>> I think that is the case.

Yes. That's the case.
Kiyoshi and I was removing it but couldn't take time to audit the isolation
of pg_init state machine from the process_queued_ios.

>>> But there _might_ be some corner cases which require us to do internal
>>> queueing.
>>  
>> We *only* add I/O to the internal queue in map_io() - which can always
>> be replaced with REQUEUE,  As long as we still 'wake up' the queue
>> immediately when we are ready to receive the I/O, I can't think of any
>> other reason.  And it would let us remove quite a bit of tricky code!
>>
> Precisely what I was thinking.
> 
> I'll be cobbling together a patch.
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 5adede1..a1aaac9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -390,7 +390,11 @@  static int map_io(struct multipath *m, struct request *clone,
 	if (was_queued)
 		m->queue_size--;
 
-	if ((pgpath && m->queue_io) ||
+	if (m->pg_init_required) {
+		if (!m->pg_init_in_progress)
+			queue_work(kmultipathd, &m->process_queued_ios);
+		r = DM_MAPIO_REQUEUE;
+	} else if ((pgpath && m->queue_io) ||
 	    (!pgpath && m->queue_if_no_path)) {
 		/* Queue for the daemon to resubmit */
 		list_add_tail(&clone->queuelist, &m->queued_ios);
@@ -1641,6 +1645,11 @@  static int multipath_busy(struct dm_target *ti)
 
 	spin_lock_irqsave(&m->lock, flags);
 
+	/* pg_init in progress, requeue until done */
+	if (m->pg_init_in_progress) {
+		busy = 1;
+		goto out;
+	}
 	/* Guess which priority_group will be used at next mapping time */
 	if (unlikely(!m->current_pgpath && m->next_pg))
 		pg = m->next_pg;