Message ID | Pine.LNX.4.64.0906251344350.4229@e-smith.charlieb.ott.istop.com (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | christophe varoqui |
Headers | show |
On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote: > > multipathd logs a path checker message whenever state changes. But it also > logs every non-changed message when verbosity is >=4 and newstate is > PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 > and newstate is PATH_DOWN. > > I believe that the message should only be logged, once, when state changes. I disagree. If you look at the stuff that gets logged when verbosity >= 4, you'll see that it's a whole bunch of stuff that normal users would never care about, but may be useful for debugging. This clearly fits. I also think that having a path down is something that shouldn't just blip by once in the logs. It's something that often requires the sysadmin to intervene. I could see limitting it to, say, only happen on every tenth pass through the checker loop, so that these messages don't clog up the logs so bad. But since there is the option to set the verbosity to 1 to avoid this, I think that printing the path down messages repeatedly is useful. -Ben > > Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au> > > --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:57.248455000 -0400 > +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:12.968639000 -0400 > @@ -1041,7 +1041,6 @@ > enable_group(pp); > } > else if (newstate == PATH_UP || newstate == PATH_GHOST) { > - LOG_MSG(4, checker_message(&pp->checker)); > /* > * double the next check delay. > * max at conf->max_checkint > @@ -1055,8 +1054,6 @@ > condlog(4, "%s: delay next check %is", > pp->dev_t, pp->tick); > } > - else if (newstate == PATH_DOWN) > - LOG_MSG(2, checker_message(&pp->checker)); > > pp->state = newstate; > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
I agree with Ben. Reducing it to every 10th pass for down would be a good compromise. On Fri, 2009-06-26 at 15:50 -0500, Benjamin Marzinski wrote: > On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote: > > > > multipathd logs a path checker message whenever state changes. But it also > > logs every non-changed message when verbosity is >=4 and newstate is > > PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 > > and newstate is PATH_DOWN. > > > > I believe that the message should only be logged, once, when state changes. > > I disagree. If you look at the stuff that gets logged when verbosity >= > 4, you'll see that it's a whole bunch of stuff that normal users would > never care about, but may be useful for debugging. This clearly fits. > > I also think that having a path down is something that shouldn't just > blip by once in the logs. It's something that often requires the > sysadmin to intervene. I could see limitting it to, say, only happen > on every tenth pass through the checker loop, so that these messages > don't clog up the logs so bad. But since there is the option to set the > verbosity to 1 to avoid this, I think that printing the path down > messages repeatedly is useful. > > -Ben > > > > > Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au> > > > > --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:57.248455000 -0400 > > +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:12.968639000 -0400 > > @@ -1041,7 +1041,6 @@ > > enable_group(pp); > > } > > else if (newstate == PATH_UP || newstate == PATH_GHOST) { > > - LOG_MSG(4, checker_message(&pp->checker)); > > /* > > * double the next check delay. > > * max at conf->max_checkint > > @@ -1055,8 +1054,6 @@ > > condlog(4, "%s: delay next check %is", > > pp->dev_t, pp->tick); > > } > > - else if (newstate == PATH_DOWN) > > - LOG_MSG(2, checker_message(&pp->checker)); > > > > pp->state = newstate; > > > > -- > > dm-devel mailing list > > dm-devel@redhat.com > > https://www.redhat.com/mailman/listinfo/dm-devel > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Jun 26, 2009 at 03:50:57PM -0500, Benjamin Marzinski wrote: > On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote: > > > > multipathd logs a path checker message whenever state changes. But it also > > logs every non-changed message when verbosity is >=4 and newstate is > > PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 > > and newstate is PATH_DOWN. > > > > I believe that the message should only be logged, once, when state changes. > > I disagree. If you look at the stuff that gets logged when verbosity >= > 4, you'll see that it's a whole bunch of stuff that normal users would > never care about, but may be useful for debugging. This clearly fits. > > I also think that having a path down is something that shouldn't just > blip by once in the logs. It's something that often requires the > sysadmin to intervene. I could see limitting it to, say, only happen > on every tenth pass through the checker loop, so that these messages > don't clog up the logs so bad. But since there is the option to set the > verbosity to 1 to avoid this, I think that printing the path down > messages repeatedly is useful. Hmm.. What if you have 100 LUNs with four paths, meaning you get 400 messages saying: "Down! Down! Pants on fire!" -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski wrote: > On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote: >> multipathd logs a path checker message whenever state changes. But it also >> logs every non-changed message when verbosity is >=4 and newstate is >> PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 >> and newstate is PATH_DOWN. >> >> I believe that the message should only be logged, once, when state changes. > > I disagree. If you look at the stuff that gets logged when verbosity >= > 4, you'll see that it's a whole bunch of stuff that normal users would > never care about, but may be useful for debugging. This clearly fits. > > I also think that having a path down is something that shouldn't just > blip by once in the logs. It's something that often requires the > sysadmin to intervene. I could see limitting it to, say, only happen > on every tenth pass through the checker loop, so that these messages > don't clog up the logs so bad. But since there is the option to set the > verbosity to 1 to avoid this, I think that printing the path down > messages repeatedly is useful. > Agreed strongly. There is nothing more annoying than having 'one-sided' messages, like printing a path down message _without_ any accompanying path up message. That sort of thing always messes up debugging. Cheers, Hannes
On Thu, 25 Jun 2009, Charlie Brady wrote: > multipathd logs a path checker message whenever state changes. But it also > logs every non-changed message when verbosity is >=4 and newstate is PATH_UP > or PATH_GHOST, and every non-changed message when verbosity is >= 2 and > newstate is PATH_DOWN. > > I believe that the message should only be logged, once, when state changes. I have a mystery related to state change logging which I hope somebody can help me solve. I am running multipathd with the following patch applied (as well as the patch from https://www.redhat.com/archives/dm-devel/2009-June/msg00293.html which updates that checker message with each poll): > Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au> > > --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 > 13:32:57.248455000 -0400 > +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c > 2009-06-25 13:32:12.968639000 -0400 > @@ -1041,7 +1041,6 @@ > enable_group(pp); > } > else if (newstate == PATH_UP || newstate == > PATH_GHOST) { > - LOG_MSG(4, checker_message(&pp->checker)); > /* > * double the next check delay. > * max at conf->max_checkint > @@ -1055,8 +1054,6 @@ > condlog(4, "%s: delay next check %is", > pp->dev_t, pp->tick); > } > - else if (newstate == PATH_DOWN) > - LOG_MSG(2, checker_message(&pp->checker)); > > pp->state = newstate; Therefore, the only logging of checker_message() remaining is this: ... if (newstate != pp->state) { int oldstate = pp->state; pp->state = newstate; LOG_MSG(1, checker_message(&pp->checker)); ... What I am seeing however is this: [root@sun4150node2 ~]# grep 'rdac checker' /var/log/messages | tail -10 Jul 3 12:29:44 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:29:49 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:29:54 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:29:59 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:04 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:09 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:14 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:19 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:24 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:30:29 sun4150node2 multipathd: sdc: rdac checker reports path is up [root@sun4150node2 ~]# So I am seeing repeated logging of 'up' state messages, although I only expect to see one message per state change. As further background, I have four iscsi paths configured to two ports on two controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management software has also configured a non-working IPV6 session). [root@sun4150node2 ~]# iscsiadm -m node -P 1 Target: iqn.1986-03.com.sun:2510.600a0b800049afbd000000004a36468f Portal: 10.37.13.101:3260,1 Iface Name: default Portal: [fe80:0000:0000:0000:02a0:b8ff:fe4a:1c6d]:3260,1 Iface Name: default Portal: 10.37.13.104:3260,2 Iface Name: default Portal: 10.37.13.103:3260,2 Iface Name: default Portal: 10.37.13.102:3260,1 Iface Name: default [root@sun4150node2 ~]# Two of the 2510's port connections are disconnected (10.37.13.103 and 10.37.13.104). The path checker apparently finds the 8:32 path as intermittent: [root@sun4150node2 ~]# echo show paths | multipathd -k multipathd> hcil dev dev_t pri dm_st chk_st next_check 10:0:0:0 sdd 8:48 100 [failed][faulty] XXXXXXXX.. 4/5 11:0:0:0 sde 8:64 100 [failed][faulty] XXXXXXXX.. 4/5 9:0:0:0 sdb 8:16 0 [active][ready] XXXXXX.... 12/20 12:0:0:0 sdc 8:32 0 [failed][faulty] XXXXXX.... 3/5 multipathd> [root@sun4150node2 ~]# [root@sun4150node2 ~]# [root@sun4150node2 ~]# echo show paths | multipathd -k multipathd> hcil dev dev_t pri dm_st chk_st next_check 10:0:0:0 sdd 8:48 100 [failed][faulty] XX........ 1/5 11:0:0:0 sde 8:64 100 [failed][faulty] XX........ 1/5 9:0:0:0 sdb 8:16 0 [active][ready] XX........ 4/20 12:0:0:0 sdc 8:32 0 [active][ready] XXXXXXXXXX 5/5 multipathd> [root@sun4150node2 ~]# echo show paths | multipathd -k multipathd> hcil dev dev_t pri dm_st chk_st next_check 10:0:0:0 sdd 8:48 100 [failed][faulty] XXXXXX.... 3/5 11:0:0:0 sde 8:64 100 [failed][faulty] XXXXXX.... 3/5 9:0:0:0 sdb 8:16 0 [active][ready] XXXXXXXX.. 16/20 12:0:0:0 sdc 8:32 0 [failed][faulty] XXXX...... 2/5 multipathd> [root@sun4150node2 ~]# Here is a longer excerpt from syslog: Jul 3 12:33:41 sun4150node2 iscsid: connect failed (113) Jul 3 12:33:44 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:33:44 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:33:44 sun4150node2 multipathd: mpath0: remaining active paths: 2 Jul 3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1 Jul 3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1 Jul 3 12:33:44 sun4150node2 multipathd: dm-2: add map (uevent) Jul 3 12:33:44 sun4150node2 multipathd: dm-2: devmap already registered Jul 3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8) Jul 3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8) Jul 3 12:33:47 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:33:47 sun4150node2 multipathd: dm-2: add map (uevent) Jul 3 12:33:47 sun4150node2 multipathd: dm-2: devmap already registered Jul 3 12:33:47 sun4150node2 multipathd: 8:32: mark as failed Jul 3 12:33:47 sun4150node2 multipathd: mpath0: remaining active paths: 1 Jul 3 12:33:47 sun4150node2 iscsid: connect failed (113) Jul 3 12:33:47 sun4150node2 iscsid: connect failed (113) Jul 3 12:33:49 sun4150node2 multipathd: sdc: rdac checker reports path is up Jul 3 12:33:49 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:33:49 sun4150node2 multipathd: mpath0: remaining active paths: 2 Jul 3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1 Jul 3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1 Jul 3 12:33:49 sun4150node2 multipathd: dm-2: add map (uevent) Jul 3 12:33:49 sun4150node2 multipathd: dm-2: devmap already registered Jul 3 12:33:50 sun4150node2 multipathd: dm-2: add map (uevent) Jul 3 12:33:50 sun4150node2 multipathd: dm-2: devmap already registered Jul 3 12:33:50 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:33:50 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8) Jul 3 12:33:50 sun4150node2 multipathd: 8:32: mark as failed Jul 3 12:33:50 sun4150node2 multipathd: mpath0: remaining active paths: 1 I'd be most grateful if someone can help me make sense of this. And preferably help me find and fix the bug. multipath-tools version is 0.4.7.rhel5.13. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, 3 Jul 2009, Charlie Brady wrote: > Jul 3 12:29:59 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:04 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:09 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:14 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:19 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:24 sun4150node2 multipathd: sdc: rdac checker reports path is up > Jul 3 12:30:29 sun4150node2 multipathd: sdc: rdac checker reports path is up > [root@sun4150node2 ~]# This is the cycle of multipathd failing and reinstating the device: [root@sun4150node2 ~]# grep 8:32 /var/log/messages | tail -12 Jul 3 12:57:38 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:57:38 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:57:39 sun4150node2 multipathd: 8:32: mark as failed Jul 3 12:57:43 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:57:44 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:57:44 sun4150node2 multipathd: 8:32: mark as failed Jul 3 12:57:48 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:57:50 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:57:50 sun4150node2 multipathd: 8:32: mark as failed Jul 3 12:57:53 sun4150node2 multipathd: 8:32: reinstated Jul 3 12:57:53 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32. Jul 3 12:57:54 sun4150node2 multipathd: 8:32: mark as failed [root@sun4150node2 ~]# > As further background, I have four iscsi paths configured to two ports on two > controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management > software has also configured a non-working IPV6 session). On a related issue, is there any way to learn which of the iscsi paths is associated with which device node (/dev/sdX)? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, 3 Jul 2009, Charlie Brady wrote: > As further background, I have four iscsi paths configured to two ports on two > controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management > software has also configured a non-working IPV6 session). ... > The path checker apparently finds the 8:32 path as intermittent: I have found these logs: Jul 2 15:33:29 sun4150node2 kernel: sd 9:0:0:0: rdac: LUN 0 (unowned) Jul 3 10:07:34 sun4150node2 kernel: sd 9:0:0:0: queueing MODE_SELECT command. Jul 3 10:07:34 sun4150node2 kernel: sd 9:0:0:0: MODE_SELECT failed with sense 0x62904. Jul 3 10:07:37 sun4150node2 kernel: sd 9:0:0:0: queueing MODE_SELECT command. and wonder whether this thread (and patch) is relevant: http://kerneltrap.org/mailarchive/linux-scsi/2008/10/30/3873414 I'll apply the patch, build new scsi_dh_rdac module, reboot and see what the state is. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
In case it is relevant, here is my device configuration. devices { device { vendor "SUN" product "LCSM100_I" getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout "/sbin/mpath_prio_rdac /dev/%n" features "0" hardware_handler "1 rdac" path_grouping_policy group_by_prio failback immediate rr_weight uniform no_path_retry queue rr_min_io 1000 path_checker rdac } } -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, 26 Jun 2009, Konrad Rzeszutek wrote: > On Fri, Jun 26, 2009 at 03:50:57PM -0500, Benjamin Marzinski wrote: >> On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote: >>> >>> multipathd logs a path checker message whenever state changes. But it also >>> logs every non-changed message when verbosity is >=4 and newstate is >>> PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 >>> and newstate is PATH_DOWN. >>> >>> I believe that the message should only be logged, once, when state changes. >> >> I disagree. If you look at the stuff that gets logged when verbosity >= >> 4, you'll see that it's a whole bunch of stuff that normal users would >> never care about, but may be useful for debugging. This clearly fits. >> >> I also think that having a path down is something that shouldn't just >> blip by once in the logs. It's something that often requires the >> sysadmin to intervene. I could see limitting it to, say, only happen >> on every tenth pass through the checker loop, so that these messages >> don't clog up the logs so bad. But since there is the option to set the >> verbosity to 1 to avoid this, I think that printing the path down >> messages repeatedly is useful. > > Hmm.. What if you have 100 LUNs with four paths, meaning you get 400 > messages saying: "Down! Down! Pants on fire!" Or, in real life, three LUNS with 16paths each: http://groups.google.am/group/open-iscsi/browse_thread/thread/95865e5da58a28f2 FWIW, I don't think there is any practical value in posting repeated "path down" messages to syslog, and it's wrong to do that at default verbosity level. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
--- multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:57.248455000 -0400 +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:12.968639000 -0400 @@ -1041,7 +1041,6 @@ enable_group(pp); } else if (newstate == PATH_UP || newstate == PATH_GHOST) { - LOG_MSG(4, checker_message(&pp->checker)); /* * double the next check delay. * max at conf->max_checkint @@ -1055,8 +1054,6 @@ condlog(4, "%s: delay next check %is", pp->dev_t, pp->tick); } - else if (newstate == PATH_DOWN) - LOG_MSG(2, checker_message(&pp->checker)); pp->state = newstate;
multipathd logs a path checker message whenever state changes. But it also logs every non-changed message when verbosity is >=4 and newstate is PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 and newstate is PATH_DOWN. I believe that the message should only be logged, once, when state changes. Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au> -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel