diff mbox

dm-mq and end_clone_request()

Message ID e2749805-00fe-a5f0-157c-66d6d0910dd4@suse.de (mailing list archive)
State Not Applicable, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Hannes Reinecke Aug. 4, 2016, 9:53 a.m. UTC
On 08/03/2016 06:55 PM, Bart Van Assche wrote:
> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
>> But I asked you to run the v4.7 kernel patches I
>> pointed to _without_ any of your debug patches.
> 
> I need several patches to fix bugs that are not related to the device
> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> (https://lkml.org/lkml/2016/8/3/289).
> 
Hmm. Can you test with this patch?


Reasoning:
The original check for dm_noflush_suspending() was for bio-based
drivers, which needed to queue I/O within the device-mapper core.
So during suspend this I/O would keep a reference to the device-mapper
core and the table couldn't be swapped.
For request-based multipathing, however, the I/O is _never_ held within
the device-mapper core but rather pushed back to the request queue.
IE even for pushback the I/O will never hold a reference to the
device-mapper core, and the tables can be swapped irrespective of the
'dm_noflush_suspend()' setting.

Or that's the idea, at least :-)

Yes Mike, I know, it's not going to work with bio-based multipathing.
But this is just for figuring out where the real issue is.

Cheers,

Hannes

Comments

Hannes Reinecke Aug. 4, 2016, 10:09 a.m. UTC | #1
On 08/04/2016 11:53 AM, Hannes Reinecke wrote:
> On 08/03/2016 06:55 PM, Bart Van Assche wrote:
>> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
>>> But I asked you to run the v4.7 kernel patches I
>>> pointed to _without_ any of your debug patches.
>>
>> I need several patches to fix bugs that are not related to the device
>> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
>> (https://lkml.org/lkml/2016/8/3/289).
>>
> Hmm. Can you test with this patch?
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 7790a70..9daed03 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
>  {
>         return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
>                 ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> -                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> -                dm_noflush_suspending(m->ti)));
> +                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
>  }
> 
>  /*
> 
> Reasoning:
> The original check for dm_noflush_suspending() was for bio-based
> drivers, which needed to queue I/O within the device-mapper core.
> So during suspend this I/O would keep a reference to the device-mapper
> core and the table couldn't be swapped.
> For request-based multipathing, however, the I/O is _never_ held within
> the device-mapper core but rather pushed back to the request queue.
> IE even for pushback the I/O will never hold a reference to the
> device-mapper core, and the tables can be swapped irrespective of the
> 'dm_noflush_suspend()' setting.
> 
> Or that's the idea, at least :-)
> 
> Yes Mike, I know, it's not going to work with bio-based multipathing.
> But this is just for figuring out where the real issue is.
> 
And indeed.

multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag.
(On the grounds that originally it needed to flush all I/O from the
device-mapper core).
Which will be causing I/O errors if any I/O is executed after
->presuspend has been called.

Cheers,

Hannes
Mike Snitzer Aug. 4, 2016, 3:10 p.m. UTC | #2
On Thu, Aug 04 2016 at  6:09am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 08/04/2016 11:53 AM, Hannes Reinecke wrote:
> > On 08/03/2016 06:55 PM, Bart Van Assche wrote:
> >> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
> >>> But I asked you to run the v4.7 kernel patches I
> >>> pointed to _without_ any of your debug patches.
> >>
> >> I need several patches to fix bugs that are not related to the device
> >> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> >> (https://lkml.org/lkml/2016/8/3/289).
> >>
> > Hmm. Can you test with this patch?
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 7790a70..9daed03 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
> >  {
> >         return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> >                 ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > -                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > -                dm_noflush_suspending(m->ti)));
> > +                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
> >  }
> > 
> >  /*
> > 
> > Reasoning:
> > The original check for dm_noflush_suspending() was for bio-based
> > drivers, which needed to queue I/O within the device-mapper core.
> > So during suspend this I/O would keep a reference to the device-mapper
> > core and the table couldn't be swapped.
> > For request-based multipathing, however, the I/O is _never_ held within
> > the device-mapper core but rather pushed back to the request queue.
> > IE even for pushback the I/O will never hold a reference to the
> > device-mapper core, and the tables can be swapped irrespective of the
> > 'dm_noflush_suspend()' setting.
> > 
> > Or that's the idea, at least :-)
> > 
> > Yes Mike, I know, it's not going to work with bio-based multipathing.
> > But this is just for figuring out where the real issue is.
> > 
> And indeed.
> 
> multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag.
> (On the grounds that originally it needed to flush all I/O from the
> device-mapper core).
> Which will be causing I/O errors if any I/O is executed after
> ->presuspend has been called.

The only time multipathd doesn't use noflush is on resize.  Otherwise
I'm pretty sure it _does_ use noflush all the time.

But the point is that the map method shouldn't be called while the
multipath device is suspended.

I already provided fixes for this, staged here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8

and relative to to 4.7:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes

With these patches our testing on real SRP hardware testbed (fast DDN
backend) doesn't see any IO errors.

But I'll revisit must_push_back relative to dm_noflush_suspending();
specifically the new must_push_back_rq() could be made to not check
dm_noflush_suspending().

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7790a70..9daed03 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -439,8 +439,7 @@  static int must_push_back(struct multipath *m)
 {
        return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
                ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
-                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
-                dm_noflush_suspending(m->ti)));
+                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
 }

 /*