[8/8] dm-mpath: do not activate failed paths

Message ID	1393486229-72034-9-git-send-email-hare@suse.de (mailing list archive)
State	Superseded, archived
Delegated to:	Mike Snitzer
Headers	show Return-Path: <dm-devel-bounces@redhat.com> From: Hannes Reinecke <hare@suse.de> To: Alasdair Kergon <agk@redhat.com> Date: Thu, 27 Feb 2014 08:30:29 +0100 Message-Id: <1393486229-72034-9-git-send-email-hare@suse.de> In-Reply-To: <1393486229-72034-1-git-send-email-hare@suse.de> References: <1393486229-72034-1-git-send-email-hare@suse.de> Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>, dm-devel@redhat.com, Mike Snitzer <snitzer@redhat.com> Subject: [dm-devel] [PATCH 8/8] dm-mpath: do not activate failed paths Precedence: junk Reply-To: device-mapper development <dm-devel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com

Hannes Reinecke Feb. 27, 2014, 7:30 a.m. UTC

activate_path() is run without a lock, so the path might be
set to failed before activate_path() had a chance to run.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-mpath.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Junichi Nomura Feb. 28, 2014, 8:49 a.m. UTC | #1

On 02/27/14 16:30, Hannes Reinecke wrote:
> activate_path() is run without a lock, so the path might be
> set to failed before activate_path() had a chance to run.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-mpath.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 0a40fa9..22e7365 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
>  	struct pgpath *pgpath =
>  		container_of(work, struct pgpath, activate_path.work);
>  
> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
> -				pg_init_done, pgpath);
> +	if (pgpath->is_active)
> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
> +				 pg_init_done, pgpath);
> +	else
> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);

Hi Hannes,

What problem are you going to fix with this?
It is still possible that the path is set to failed just after
"if (pgpath->is_active)" and before "scsi_dh_activate".

Hannes Reinecke Feb. 28, 2014, 9:32 a.m. UTC | #2

On 02/28/2014 09:49 AM, Junichi Nomura wrote:
> On 02/27/14 16:30, Hannes Reinecke wrote:
>> activate_path() is run without a lock, so the path might be
>> set to failed before activate_path() had a chance to run.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/md/dm-mpath.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>> index 0a40fa9..22e7365 100644
>> --- a/drivers/md/dm-mpath.c
>> +++ b/drivers/md/dm-mpath.c
>> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
>>  	struct pgpath *pgpath =
>>  		container_of(work, struct pgpath, activate_path.work);
>>  
>> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>> -				pg_init_done, pgpath);
>> +	if (pgpath->is_active)
>> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>> +				 pg_init_done, pgpath);
>> +	else
>> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
> 
> Hi Hannes,
> 
> What problem are you going to fix with this?
> It is still possible that the path is set to failed just after
> "if (pgpath->is_active)" and before "scsi_dh_activate".
> 
activate_path() is run from a workqueue, so there is a race window
between the check ->is_active in __pg_init_all_paths() and the time
the scsi_dh_activate() is executed. And as activate_path)() is run
without any locks we don't have any synchronization with fail_path().
So it's well possible that a path gets failed in between
__pg_init_all_paths() and activate_paths().
Granted, we could remove the check for ->is_active in
__pg_init_all_paths(), but I wanted to avoid scheduling a workqueue
item which is known to be failing.

Cheers,

Hannes

Junichi Nomura Feb. 28, 2014, 9:37 a.m. UTC | #3

On 02/28/14 18:32, Hannes Reinecke wrote:
> On 02/28/2014 09:49 AM, Junichi Nomura wrote:
>> On 02/27/14 16:30, Hannes Reinecke wrote:
>>> activate_path() is run without a lock, so the path might be
>>> set to failed before activate_path() had a chance to run.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>  drivers/md/dm-mpath.c | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>>> index 0a40fa9..22e7365 100644
>>> --- a/drivers/md/dm-mpath.c
>>> +++ b/drivers/md/dm-mpath.c
>>> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
>>>  	struct pgpath *pgpath =
>>>  		container_of(work, struct pgpath, activate_path.work);
>>>  
>>> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>> -				pg_init_done, pgpath);
>>> +	if (pgpath->is_active)
>>> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>> +				 pg_init_done, pgpath);
>>> +	else
>>> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
>>
>> Hi Hannes,
>>
>> What problem are you going to fix with this?
>> It is still possible that the path is set to failed just after
>> "if (pgpath->is_active)" and before "scsi_dh_activate".
>>
> activate_path() is run from a workqueue, so there is a race window
> between the check ->is_active in __pg_init_all_paths() and the time
> the scsi_dh_activate() is executed. And as activate_path)() is run
> without any locks we don't have any synchronization with fail_path().
> So it's well possible that a path gets failed in between
> __pg_init_all_paths() and activate_paths().

What I pointed is that your patch makes the window very small
but doesn't close it.
If the race is a problem, this patch doesn't fix the problem.

Hannes Reinecke Feb. 28, 2014, 9:52 a.m. UTC | #4

On 02/28/2014 10:37 AM, Junichi Nomura wrote:
> On 02/28/14 18:32, Hannes Reinecke wrote:
>> On 02/28/2014 09:49 AM, Junichi Nomura wrote:
>>> On 02/27/14 16:30, Hannes Reinecke wrote:
>>>> activate_path() is run without a lock, so the path might be
>>>> set to failed before activate_path() had a chance to run.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> ---
>>>>  drivers/md/dm-mpath.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>>>> index 0a40fa9..22e7365 100644
>>>> --- a/drivers/md/dm-mpath.c
>>>> +++ b/drivers/md/dm-mpath.c
>>>> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
>>>>  	struct pgpath *pgpath =
>>>>  		container_of(work, struct pgpath, activate_path.work);
>>>>  
>>>> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>>> -				pg_init_done, pgpath);
>>>> +	if (pgpath->is_active)
>>>> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>>> +				 pg_init_done, pgpath);
>>>> +	else
>>>> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
>>>
>>> Hi Hannes,
>>>
>>> What problem are you going to fix with this?
>>> It is still possible that the path is set to failed just after
>>> "if (pgpath->is_active)" and before "scsi_dh_activate".
>>>
>> activate_path() is run from a workqueue, so there is a race window
>> between the check ->is_active in __pg_init_all_paths() and the time
>> the scsi_dh_activate() is executed. And as activate_path)() is run
>> without any locks we don't have any synchronization with fail_path().
>> So it's well possible that a path gets failed in between
>> __pg_init_all_paths() and activate_paths().
> 
> What I pointed is that your patch makes the window very small
> but doesn't close it.
> If the race is a problem, this patch doesn't fix the problem.
> 
True. As said I just want to avoid unnecessary overhead by calling
functions which are known to be failing.
This patch actually makes more sense in combination with the 'accept
failed paths' patch, where it's much more likely to trigger.
So if you insist I could move it over to there.

Cheers,

Hannes

Junichi Nomura Feb. 28, 2014, 10:08 a.m. UTC | #5

On 02/28/14 18:52, Hannes Reinecke wrote:
> On 02/28/2014 10:37 AM, Junichi Nomura wrote:
>> On 02/28/14 18:32, Hannes Reinecke wrote:
>>> On 02/28/2014 09:49 AM, Junichi Nomura wrote:
>>>> On 02/27/14 16:30, Hannes Reinecke wrote:
>>>>> activate_path() is run without a lock, so the path might be
>>>>> set to failed before activate_path() had a chance to run.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>>> ---
>>>>>  drivers/md/dm-mpath.c | 7 +++++--
>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>>>>> index 0a40fa9..22e7365 100644
>>>>> --- a/drivers/md/dm-mpath.c
>>>>> +++ b/drivers/md/dm-mpath.c
>>>>> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
>>>>>  	struct pgpath *pgpath =
>>>>>  		container_of(work, struct pgpath, activate_path.work);
>>>>>  
>>>>> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>>>> -				pg_init_done, pgpath);
>>>>> +	if (pgpath->is_active)
>>>>> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
>>>>> +				 pg_init_done, pgpath);
>>>>> +	else
>>>>> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
>>>>
>>>> Hi Hannes,
>>>>
>>>> What problem are you going to fix with this?
>>>> It is still possible that the path is set to failed just after
>>>> "if (pgpath->is_active)" and before "scsi_dh_activate".
>>>>
>>> activate_path() is run from a workqueue, so there is a race window
>>> between the check ->is_active in __pg_init_all_paths() and the time
>>> the scsi_dh_activate() is executed. And as activate_path)() is run
>>> without any locks we don't have any synchronization with fail_path().
>>> So it's well possible that a path gets failed in between
>>> __pg_init_all_paths() and activate_paths().
>>
>> What I pointed is that your patch makes the window very small
>> but doesn't close it.
>> If the race is a problem, this patch doesn't fix the problem.
>>
> True. As said I just want to avoid unnecessary overhead by calling
> functions which are known to be failing.

"to avoid unnecessary overhead by calling functions which are known
 to be failing"

IMO, that should be in the patch description. :)

> This patch actually makes more sense in combination with the 'accept
> failed paths' patch, where it's much more likely to trigger.
> So if you insist I could move it over to there.

If the description is fixed, I have no problem with this patch.

Mike Snitzer Feb. 28, 2014, 2:22 p.m. UTC | #6

On Fri, Feb 28 2014 at  4:52am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 02/28/2014 10:37 AM, Junichi Nomura wrote:
> > On 02/28/14 18:32, Hannes Reinecke wrote:
> >> On 02/28/2014 09:49 AM, Junichi Nomura wrote:
> >>> On 02/27/14 16:30, Hannes Reinecke wrote:
> >>>> activate_path() is run without a lock, so the path might be
> >>>> set to failed before activate_path() had a chance to run.
> >>>>
> >>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >>>> ---
> >>>>  drivers/md/dm-mpath.c | 7 +++++--
> >>>>  1 file changed, 5 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> >>>> index 0a40fa9..22e7365 100644
> >>>> --- a/drivers/md/dm-mpath.c
> >>>> +++ b/drivers/md/dm-mpath.c
> >>>> @@ -1196,8 +1196,11 @@ static void activate_path(struct work_struct *work)
> >>>>  	struct pgpath *pgpath =
> >>>>  		container_of(work, struct pgpath, activate_path.work);
> >>>>  
> >>>> -	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
> >>>> -				pg_init_done, pgpath);
> >>>> +	if (pgpath->is_active)
> >>>> +		scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
> >>>> +				 pg_init_done, pgpath);
> >>>> +	else
> >>>> +		pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
> >>>
> >>> Hi Hannes,
> >>>
> >>> What problem are you going to fix with this?
> >>> It is still possible that the path is set to failed just after
> >>> "if (pgpath->is_active)" and before "scsi_dh_activate".
> >>>
> >> activate_path() is run from a workqueue, so there is a race window
> >> between the check ->is_active in __pg_init_all_paths() and the time
> >> the scsi_dh_activate() is executed. And as activate_path)() is run
> >> without any locks we don't have any synchronization with fail_path().
> >> So it's well possible that a path gets failed in between
> >> __pg_init_all_paths() and activate_paths().
> > 
> > What I pointed is that your patch makes the window very small
> > but doesn't close it.
> > If the race is a problem, this patch doesn't fix the problem.
> > 
> True. As said I just want to avoid unnecessary overhead by calling
> functions which are known to be failing.
> This patch actually makes more sense in combination with the 'accept
> failed paths' patch, where it's much more likely to trigger.

FYI, I still intend to review/take your "accept failed paths" patch.
Would be helpful if any rebase is needed for that patch that you do so
and re-post.

One thing I noticed is you're only converting MAJ:MIN paths to devices.
I think we should factor a helper out of DM core that does the full path
lookup check from dm_get_device() -- rather than you open coding an
older version of the MAJ:MIN device path processing.

But is there a reason for not using lookup_bdev()?  Because the device
is failed it cannot be found using lookup_bdev()?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Hannes Reinecke Feb. 28, 2014, 2:44 p.m. UTC | #7

On 02/28/2014 03:22 PM, Mike Snitzer wrote:
[ .. ]
> 
> FYI, I still intend to review/take your "accept failed paths" patch.
> Would be helpful if any rebase is needed for that patch that you do so
> and re-post.
> 
> One thing I noticed is you're only converting MAJ:MIN paths to devices.
> I think we should factor a helper out of DM core that does the full path
> lookup check from dm_get_device() -- rather than you open coding an
> older version of the MAJ:MIN device path processing.
> 
> But is there a reason for not using lookup_bdev()?  Because the device
> is failed it cannot be found using lookup_bdev()?
> 
Yes, that's precisely it.
When setting dev_loss_tmo very aggressively (to, say, 10 or even 5
seconds) it might well be that multipathd won't be able to keep up
with the events in time.
So by the time multipathd tries to push a new table into the kernel
(which contains major:minor numbers only) those devices are already
gone. So lookup_bdev() won't be able to find them, either.

Not sure if factoring things out from dm_get_device() would help
here ...

But I'll be sending an updated patch, which'll apply on top of the
'push back' patchset.

Cheers,

Hannes

Mike Snitzer Feb. 28, 2014, 3:02 p.m. UTC | #8

On Fri, Feb 28 2014 at  9:44am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 02/28/2014 03:22 PM, Mike Snitzer wrote:
> [ .. ]
> > 
> > FYI, I still intend to review/take your "accept failed paths" patch.
> > Would be helpful if any rebase is needed for that patch that you do so
> > and re-post.
> > 
> > One thing I noticed is you're only converting MAJ:MIN paths to devices.
> > I think we should factor a helper out of DM core that does the full path
> > lookup check from dm_get_device() -- rather than you open coding an
> > older version of the MAJ:MIN device path processing.
> > 
> > But is there a reason for not using lookup_bdev()?  Because the device
> > is failed it cannot be found using lookup_bdev()?
> > 
> Yes, that's precisely it.
> When setting dev_loss_tmo very aggressively (to, say, 10 or even 5
> seconds) it might well be that multipathd won't be able to keep up
> with the events in time.
> So by the time multipathd tries to push a new table into the kernel
> (which contains major:minor numbers only) those devices are already
> gone. So lookup_bdev() won't be able to find them, either.

Been talking this over with Alasdair.  Need some things clarified.

Are you needing to handle non-existent devices during initial mpath
table load?

Or is the failed path in question already part of the active mpath table
-- so active table currently holds a reference to the device?

Or do you need to support both cases?  Sounds like you want to support
both cases..

> Not sure if factoring things out from dm_get_device() would help
> here ...
> 
> But I'll be sending an updated patch, which'll apply on top of the
> 'push back' patchset.

Thanks.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Hannes Reinecke Feb. 28, 2014, 3:08 p.m. UTC | #9

On 02/28/2014 04:02 PM, Mike Snitzer wrote:
> On Fri, Feb 28 2014 at  9:44am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 02/28/2014 03:22 PM, Mike Snitzer wrote:
>> [ .. ]
>>>
>>> FYI, I still intend to review/take your "accept failed paths" patch.
>>> Would be helpful if any rebase is needed for that patch that you do so
>>> and re-post.
>>>
>>> One thing I noticed is you're only converting MAJ:MIN paths to devices.
>>> I think we should factor a helper out of DM core that does the full path
>>> lookup check from dm_get_device() -- rather than you open coding an
>>> older version of the MAJ:MIN device path processing.
>>>
>>> But is there a reason for not using lookup_bdev()?  Because the device
>>> is failed it cannot be found using lookup_bdev()?
>>>
>> Yes, that's precisely it.
>> When setting dev_loss_tmo very aggressively (to, say, 10 or even 5
>> seconds) it might well be that multipathd won't be able to keep up
>> with the events in time.
>> So by the time multipathd tries to push a new table into the kernel
>> (which contains major:minor numbers only) those devices are already
>> gone. So lookup_bdev() won't be able to find them, either.
> 
> Been talking this over with Alasdair.  Need some things clarified.
> 
> Are you needing to handle non-existent devices during initial mpath
> table load?
> 
> Or is the failed path in question already part of the active mpath table
> -- so active table currently holds a reference to the device?
> 
> Or do you need to support both cases?  Sounds like you want to support
> both cases..
> 
Hmm. I wouldn't think I'd need it during initial mpath table load;
but then you never now, devices might fail faster than you'd think ...

So yeah, I'd need to support both cases.

The main reason why I've sent this patch is to get rid of these
_really_ annoying messages 'error getting device' during failover.
We really should and can do better than that.

That this patch allows to drop references to failed devices earlier
is a nice side effect :-)

Cheers,

Hannes

[8/8] dm-mpath: do not activate failed paths

Commit Message

Comments

Patch