diff mbox

RDAC path checker status change messages

Message ID Pine.LNX.4.64.0906231548060.18687@e-smith.charlieb.ott.istop.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Charlie Brady June 23, 2009, 7:50 p.m. UTC
It seems to me that this patch is required, to fix a problem with "path 
down" messages being logged when the path transitions to up (and 
presumably ghost).


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Chandra Seetharaman June 24, 2009, 6:15 p.m. UTC | #1
Hi Charlie,

We do not want to print these messages as they will be flooding
your /var/log/messages even when everything is good.

We want to print these messages only to get User's attention.

If your concern is that the last message is "down", then we should be
having a static variable and print up or ghost message only once (when
we toggle from down to up/ghost).

chandra
On Tue, 2009-06-23 at 15:50 -0400, Charlie Brady wrote:
> It seems to me that this patch is required, to fix a problem with "path 
> down" messages being logged when the path transitions to up (and 
> presumably ghost).
> 
> --- libcheckers/rdac.c.orig     2009-06-23 15:38:05.371422000 -0400
> +++ libcheckers/rdac.c  2009-06-23 15:39:28.020843000 -0400
> @@ -105,5 +105,12 @@
>                  return PATH_DOWN;
>          }
> 
> -       return ((inq.avtcvp & 0x1) ? PATH_UP : PATH_GHOST);
> +       if (inq.avtcvp & 0x1) {
> +               MSG(c, MSG_RDAC_UP);
> +               return PATH_UP;
> +       }
> +       else {
> +               MSG(c, MSG_RDAC_GHOST);
> +               return PATH_GHOST;
> +       }
>   }
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady June 24, 2009, 8:19 p.m. UTC | #2
On Wed, 24 Jun 2009, Chandra Seetharaman wrote:

> We do not want to print these messages as they will be flooding
> your /var/log/messages even when everything is good.

Agreed.

> We want to print these messages only to get User's attention.

Agreed.

> If your concern is that the last message is "down", then we should be
> having a static variable and print up or ghost message only once (when
> we toggle from down to up/ghost).

You do have a static variable, so that messages are only printed when the 
status changes. However, because of the missing MSG() calls when the 
status is PATH_UP and PATH_GHOST, then the string printed is "path down", 
when it should be "path up" or "path ghost".

I believe that the patch is good. The MSG() calls will change the message 
string, and then the higher layer will notice the status change, and print 
the message.

Please check the code and correct me if I am wrong.

>
> chandra
> On Tue, 2009-06-23 at 15:50 -0400, Charlie Brady wrote:
>> It seems to me that this patch is required, to fix a problem with "path
>> down" messages being logged when the path transitions to up (and
>> presumably ghost).
>>
>> --- libcheckers/rdac.c.orig     2009-06-23 15:38:05.371422000 -0400
>> +++ libcheckers/rdac.c  2009-06-23 15:39:28.020843000 -0400
>> @@ -105,5 +105,12 @@
>>                  return PATH_DOWN;
>>          }
>>
>> -       return ((inq.avtcvp & 0x1) ? PATH_UP : PATH_GHOST);
>> +       if (inq.avtcvp & 0x1) {
>> +               MSG(c, MSG_RDAC_UP);
>> +               return PATH_UP;
>> +       }
>> +       else {
>> +               MSG(c, MSG_RDAC_GHOST);
>> +               return PATH_GHOST;
>> +       }
>>   }
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski June 24, 2009, 9:01 p.m. UTC | #3
On Wed, Jun 24, 2009 at 04:19:00PM -0400, Charlie Brady wrote:
>
> On Wed, 24 Jun 2009, Chandra Seetharaman wrote:
>
>> We do not want to print these messages as they will be flooding
>> your /var/log/messages even when everything is good.
>
> Agreed.
>
>> We want to print these messages only to get User's attention.
>
> Agreed.
>
>> If your concern is that the last message is "down", then we should be
>> having a static variable and print up or ghost message only once (when
>> we toggle from down to up/ghost).
>
> You do have a static variable, so that messages are only printed when the 
> status changes. However, because of the missing MSG() calls when the status 
> is PATH_UP and PATH_GHOST, then the string printed is "path down", when it 
> should be "path up" or "path ghost".
>
> I believe that the patch is good. The MSG() calls will change the message 
> string, and then the higher layer will notice the status change, and print 
> the message.
>
> Please check the code and correct me if I am wrong.

You are correct. Looks fine by me.
I'll add this for RHEL 5.5. It will make it into Centos after that.

-Ben

>
>>
>> chandra
>> On Tue, 2009-06-23 at 15:50 -0400, Charlie Brady wrote:
>>> It seems to me that this patch is required, to fix a problem with "path
>>> down" messages being logged when the path transitions to up (and
>>> presumably ghost).
>>>
>>> --- libcheckers/rdac.c.orig     2009-06-23 15:38:05.371422000 -0400
>>> +++ libcheckers/rdac.c  2009-06-23 15:39:28.020843000 -0400
>>> @@ -105,5 +105,12 @@
>>>                  return PATH_DOWN;
>>>          }
>>>
>>> -       return ((inq.avtcvp & 0x1) ? PATH_UP : PATH_GHOST);
>>> +       if (inq.avtcvp & 0x1) {
>>> +               MSG(c, MSG_RDAC_UP);
>>> +               return PATH_UP;
>>> +       }
>>> +       else {
>>> +               MSG(c, MSG_RDAC_GHOST);
>>> +               return PATH_GHOST;
>>> +       }
>>>   }
>>>
>>> --
>>> dm-devel mailing list
>>> dm-devel@redhat.com
>>> https://www.redhat.com/mailman/listinfo/dm-devel
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady June 25, 2009, 4:57 p.m. UTC | #4
On Wed, 24 Jun 2009, Charlie Brady wrote:

>>  If your concern is that the last message is "down", then we should be
>>  having a static variable and print up or ghost message only once (when
>>  we toggle from down to up/ghost).
>
> You do have a static variable, so that messages are only printed when the 
> status changes.

I think I need to retract that. The messages appear to be printed when the 
status is not changing, at least for the "path down" messages.

I've patched scsi_dh_rdac.c so that SUN/LCSM100_I is included in 
rdac_dev_list[], and restarted iscsi and multipathd. I've now provoked a 
path failure via iptables. Here are the logs I see (with unpatched 
multipathd) - the duplicate message suppression from the rdac path 
checker is not working:

Jun 25 12:35:02 sun4150node1 kernel: device-mapper: multipath: Using 
scsi_dh module scsi_dh_rdac for failover/failback and
  device management.
Jun 25 12:35:08 sun4150node1 multipathd: cannot open /sbin/dasd_id : No 
such file or directory
Jun 25 12:35:08 sun4150node1 multipathd: cannot open /sbin/gnbd_import : 
No such file or directory
Jun 25 12:35:08 sun4150node1 multipathd: [copy.c] cannot open 
/sbin/dasd_id
Jun 25 12:35:08 sun4150node1 multipathd: cannot copy /sbin/dasd_id in 
ramfs : No such file or directory
Jun 25 12:35:08 sun4150node1 multipathd: [copy.c] cannot open 
/sbin/gnbd_import
Jun 25 12:35:08 sun4150node1 multipathd: cannot copy /sbin/gnbd_import in 
ramfs : No such file or directory
Jun 25 12:35:08 sun4150node1 multipathd: mpath0: event checker started
Jun 25 12:35:02 sun4150node1 kernel: device-mapper: multipath: Using 
scsi_dh module scsi_dh_rdac for failover/failback and
  device management.
Jun 25 12:46:55 sun4150node1 kernel: ping timeout of 5 secs expired, last 
rx 1703243, last ping 1708243, now 1713243
Jun 25 12:46:55 sun4150node1 kernel:  connection3:0: iscsi: detected conn 
error (1011)
Jun 25 12:46:55 sun4150node1 iscsid: Kernel reported iSCSI connection 3:0 
error (1011) state (3)
Jun 25 12:48:55 sun4150node1 kernel:  session3: iscsi: session recovery 
timed out after 120 secs
Jun 25 12:48:55 sun4150node1 kernel: iscsi: cmd 0x12 is not queued (8)
Jun 25 12:48:55 sun4150node1 multipathd: path checkers start up
Jun 25 12:48:55 sun4150node1 multipathd: sdd: rdac checker reports path is 
down
Jun 25 12:48:55 sun4150node1 kernel: device-mapper: multipath: Failing 
path 8:48.
Jun 25 12:48:55 sun4150node1 multipathd: checker failed path 8:48 in map 
mpath0
Jun 25 12:48:55 sun4150node1 multipathd: mpath0: remaining active paths: 1
Jun 25 12:48:55 sun4150node1 multipathd: dm-2: add map (uevent)
Jun 25 12:48:55 sun4150node1 multipathd: dm-2: devmap already registered
Jun 25 12:49:00 sun4150node1 kernel: iscsi: cmd 0x12 is not queued (8)
Jun 25 12:49:00 sun4150node1 multipathd: sdd: rdac checker reports path is 
down
Jun 25 12:49:35 sun4150node1 last message repeated 7 times
Jun 25 12:49:35 sun4150node1 last message repeated 7 times
Jun 25 12:50:05 sun4150node1 last message repeated 6 times
Jun 25 12:50:05 sun4150node1 kernel: iscsi: cmd 0x28 is not queued (8)
Jun 25 12:50:05 sun4150node1 kernel: sd 14:0:0:0: SCSI error: return code 
= 0x00010000
Jun 25 12:50:05 sun4150node1 kernel: end_request: I/O error, dev sdd, 
sector 0
Jun 25 12:50:05 sun4150node1 kernel: Buffer I/O error on device sdd, 
logical block 0
Jun 25 12:50:05 sun4150node1 kernel: Buffer I/O error on device sdd, 
logical block 1
Jun 25 12:50:05 sun4150node1 kernel: Buffer I/O error on device sdd, 
logical block 2
Jun 25 12:50:05 sun4150node1 kernel: Buffer I/O error on device sdd, 
logical block 3
Jun 25 12:50:05 sun4150node1 kernel: iscsi: cmd 0x28 is not queued (8)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady June 25, 2009, 5:02 p.m. UTC | #5
On Thu, 25 Jun 2009, Charlie Brady wrote:

>
> On Wed, 24 Jun 2009, Charlie Brady wrote:
>
>> >   If your concern is that the last message is "down", then we should be
>> >   having a static variable and print up or ghost message only once (when
>> >   we toggle from down to up/ghost).
>>
>>  You do have a static variable, so that messages are only printed when the
>>  status changes.
>
> I think I need to retract that. The messages appear to be printed when the 
> status is not changing, at least for the "path down" messages.
>
> I've patched scsi_dh_rdac.c so that SUN/LCSM100_I is included in 
> rdac_dev_list[], and restarted iscsi and multipathd. I've now provoked a path 
> failure via iptables. Here are the logs I see (with unpatched multipathd) - 
> the duplicate message suppression from the rdac path checker is not working:

OTOH, on path recovery, the wrong message is displayed ("path down" 
instead of "path up") but the message is not repeated:

Jun 25 12:57:51 sun4150node1 last message repeated 9 times
Jun 25 12:57:54 sun4150node1 iscsid: received iferror -38
Jun 25 12:57:54 sun4150node1 last message repeated 2 times
Jun 25 12:57:54 sun4150node1 iscsid: connection3:0 is operational after 
recovery (44 attempts)
Jun 25 12:57:56 sun4150node1 multipathd: sdd: rdac checker reports path is 
down
Jun 25 12:57:56 sun4150node1 multipathd: 8:48: reinstated
Jun 25 12:57:56 sun4150node1 multipathd: mpath0: remaining active paths: 2
Jun 25 12:57:56 sun4150node1 multipathd: dm-2: add map (uevent)
Jun 25 12:57:56 sun4150node1 multipathd: dm-2: devmap already registered


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady June 25, 2009, 5:28 p.m. UTC | #6
On Thu, 25 Jun 2009, Charlie Brady wrote:

> On Thu, 25 Jun 2009, Charlie Brady wrote:
>
>>  I think I need to retract that. The messages appear to be printed when the
>>  status is not changing, at least for the "path down" messages.
>>
>>  I've patched scsi_dh_rdac.c so that SUN/LCSM100_I is included in
>>  rdac_dev_list[], and restarted iscsi and multipathd. I've now provoked a
>>  path failure via iptables. Here are the logs I see (with unpatched
>>  multipathd) - the duplicate message suppression from the rdac path checker
>>  is not working:
>
> OTOH, on path recovery, the wrong message is displayed ("path down" instead 
> of "path up") but the message is not repeated:

>From multipathd/main.c, there are three conditions where the messages are 
printed:

...
                         if (newstate != pp->state) {
                                 int oldstate = pp->state;
                                 pp->state = newstate;
                                 LOG_MSG(1, checker_message(&pp->checker));
...
                         }
                         else if (newstate == PATH_UP || newstate == PATH_GHOST) {
                                 LOG_MSG(4, checker_message(&pp->checker));
...
                         else if (newstate == PATH_DOWN)
                                 LOG_MSG(2, checker_message(&pp->checker));
...

I think only the first might be required, and only the first will be seen 
if logging verbosity < 2 (default is 2).

So there are two path checker problems I can see (in 
multipath-tools-0.4.7.rhel5.13):

- the rdac checker leaves the message set to "path down" when path
   transitions back to up or ghost.
- multipathd prints the message on every PATH_DOWN except the first when
   verbosity is >= 2, and on every PATH_UP or PATH_GHOST except the first
   when verbosity is >= 4.

---
Charlie

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Chandra Seetharaman June 25, 2009, 6:29 p.m. UTC | #7
Yes, I agree... Your patch should work as you expect.

Thanks

chandra
On Wed, 2009-06-24 at 16:19 -0400, Charlie Brady wrote:
> On Wed, 24 Jun 2009, Chandra Seetharaman wrote:
> 
> > We do not want to print these messages as they will be flooding
> > your /var/log/messages even when everything is good.
> 
> Agreed.
> 
> > We want to print these messages only to get User's attention.
> 
> Agreed.
> 
> > If your concern is that the last message is "down", then we should be
> > having a static variable and print up or ghost message only once (when
> > we toggle from down to up/ghost).
> 
> You do have a static variable, so that messages are only printed when the 
> status changes. However, because of the missing MSG() calls when the 
> status is PATH_UP and PATH_GHOST, then the string printed is "path down", 
> when it should be "path up" or "path ghost".
> 
> I believe that the patch is good. The MSG() calls will change the message 
> string, and then the higher layer will notice the status change, and print 
> the message.
> 
> Please check the code and correct me if I am wrong.
> 
> >
> > chandra
> > On Tue, 2009-06-23 at 15:50 -0400, Charlie Brady wrote:
> >> It seems to me that this patch is required, to fix a problem with "path
> >> down" messages being logged when the path transitions to up (and
> >> presumably ghost).
> >>
> >> --- libcheckers/rdac.c.orig     2009-06-23 15:38:05.371422000 -0400
> >> +++ libcheckers/rdac.c  2009-06-23 15:39:28.020843000 -0400
> >> @@ -105,5 +105,12 @@
> >>                  return PATH_DOWN;
> >>          }
> >>
> >> -       return ((inq.avtcvp & 0x1) ? PATH_UP : PATH_GHOST);
> >> +       if (inq.avtcvp & 0x1) {
> >> +               MSG(c, MSG_RDAC_UP);
> >> +               return PATH_UP;
> >> +       }
> >> +       else {
> >> +               MSG(c, MSG_RDAC_GHOST);
> >> +               return PATH_GHOST;
> >> +       }
> >>   }
> >>
> >> --
> >> dm-devel mailing list
> >> dm-devel@redhat.com
> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady June 25, 2009, 6:40 p.m. UTC | #8
On Thu, 25 Jun 2009, Chandra Seetharaman wrote:

> Yes, I agree... Your patch should work as you expect.

Thanks - bmarzins@sourceware.org has already committed the fix.

Please see subsequent messages (including patch) re excessive path checker 
message logging.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski June 26, 2009, 4:07 p.m. UTC | #9
On Thu, Jun 25, 2009 at 01:28:40PM -0400, Charlie Brady wrote:
>
> On Thu, 25 Jun 2009, Charlie Brady wrote:
>
>> On Thu, 25 Jun 2009, Charlie Brady wrote:
>>
>>>  I think I need to retract that. The messages appear to be printed when the
>>>  status is not changing, at least for the "path down" messages.
>>>
>>>  I've patched scsi_dh_rdac.c so that SUN/LCSM100_I is included in
>>>  rdac_dev_list[], and restarted iscsi and multipathd. I've now provoked a
>>>  path failure via iptables. Here are the logs I see (with unpatched
>>>  multipathd) - the duplicate message suppression from the rdac path checker
>>>  is not working:
>>
>> OTOH, on path recovery, the wrong message is displayed ("path down" 
>> instead of "path up") but the message is not repeated:
>
>> From multipathd/main.c, there are three conditions where the messages are 
> printed:
>
> ...
>                         if (newstate != pp->state) {
>                                 int oldstate = pp->state;
>                                 pp->state = newstate;
>                                 LOG_MSG(1, checker_message(&pp->checker));
> ...
>                         }
>                         else if (newstate == PATH_UP || newstate == PATH_GHOST) {
>                                 LOG_MSG(4, checker_message(&pp->checker));
> ...
>                         else if (newstate == PATH_DOWN)
>                                 LOG_MSG(2, checker_message(&pp->checker));
> ...
>
> I think only the first might be required, and only the first will be seen 
> if logging verbosity < 2 (default is 2).
>
> So there are two path checker problems I can see (in 
> multipath-tools-0.4.7.rhel5.13):
>
> - the rdac checker leaves the message set to "path down" when path
>   transitions back to up or ghost.

Any your patch fixes that.

> - multipathd prints the message on every PATH_DOWN except the first when
>   verbosity is >= 2, and on every PATH_UP or PATH_GHOST except the first
>   when verbosity is >= 4.

Why is this a problem? 

On verbosity = 1, you only get messages when you change states.
On verbosity = 2, you additionally get repeated messages for failed
paths, which are often something that the sysadmin should be doing
something about.
On verbosity = 4, you get a message whenever the path checker runs,
which is helpful for debugging, which is the only time you would ever
want verbosity = 4.

This seems sensible to me.  If you don't like the repeated path down
messages, you can change your verbosity level to 1.  Right now that
involves editting the init script, but in future releases, you can
change it in /etc/multipath.conf.

-Ben

>
> ---
> Charlie
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

--- libcheckers/rdac.c.orig     2009-06-23 15:38:05.371422000 -0400
+++ libcheckers/rdac.c  2009-06-23 15:39:28.020843000 -0400
@@ -105,5 +105,12 @@ 
                 return PATH_DOWN;
         }

-       return ((inq.avtcvp & 0x1) ? PATH_UP : PATH_GHOST);
+       if (inq.avtcvp & 0x1) {
+               MSG(c, MSG_RDAC_UP);
+               return PATH_UP;
+       }
+       else {
+               MSG(c, MSG_RDAC_GHOST);
+               return PATH_GHOST;
+       }
  }