diff mbox

[multipath-tools-0.4.7.rhel5.13,1/1] multipathd excessive path checker logging

Message ID Pine.LNX.4.64.0906251344350.4229@e-smith.charlieb.ott.istop.com (mailing list archive)
State Rejected, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Charlie Brady June 25, 2009, 5:50 p.m. UTC
multipathd logs a path checker message whenever state changes. But it also 
logs every non-changed message when verbosity is >=4 and newstate is 
PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 
2 and newstate is PATH_DOWN.

I believe that the message should only be logged, once, when state 
changes.

Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Benjamin Marzinski June 26, 2009, 8:50 p.m. UTC | #1
On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote:
>
> multipathd logs a path checker message whenever state changes. But it also 
> logs every non-changed message when verbosity is >=4 and newstate is 
> PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 
> and newstate is PATH_DOWN.
>
> I believe that the message should only be logged, once, when state changes.

I disagree.  If you look at the stuff that gets logged when verbosity >=
4, you'll see that it's a whole bunch of stuff that normal users would
never care about, but may be useful for debugging.  This clearly fits.

I also think that having a path down is something that shouldn't just
blip by once in the logs.  It's something that often requires the
sysadmin to intervene. I could see limitting it to, say, only happen
on every tenth pass through the checker loop, so that these messages
don't clog up the logs so bad. But since there is the option to set the
verbosity to 1 to avoid this, I think that printing the path down
messages repeatedly is useful.

-Ben

>
> Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au>
>
> --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:57.248455000 -0400
> +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:12.968639000 -0400
> @@ -1041,7 +1041,6 @@
>  					enable_group(pp);
>  			}
>  			else if (newstate == PATH_UP || newstate == PATH_GHOST) {
> -				LOG_MSG(4, checker_message(&pp->checker));
>  				/*
>  				 * double the next check delay.
>  				 * max at conf->max_checkint
> @@ -1055,8 +1054,6 @@
>  				condlog(4, "%s: delay next check %is",
>  						pp->dev_t, pp->tick);
>  			}
> -			else if (newstate == PATH_DOWN)
> -				LOG_MSG(2, checker_message(&pp->checker));
>
>  			pp->state = newstate;
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Chandra Seetharaman June 26, 2009, 11:54 p.m. UTC | #2
I agree with Ben. 

Reducing it to every 10th pass for down would be a good compromise.

On Fri, 2009-06-26 at 15:50 -0500, Benjamin Marzinski wrote:
> On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote:
> >
> > multipathd logs a path checker message whenever state changes. But it also 
> > logs every non-changed message when verbosity is >=4 and newstate is 
> > PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 
> > and newstate is PATH_DOWN.
> >
> > I believe that the message should only be logged, once, when state changes.
> 
> I disagree.  If you look at the stuff that gets logged when verbosity >=
> 4, you'll see that it's a whole bunch of stuff that normal users would
> never care about, but may be useful for debugging.  This clearly fits.
> 
> I also think that having a path down is something that shouldn't just
> blip by once in the logs.  It's something that often requires the
> sysadmin to intervene. I could see limitting it to, say, only happen
> on every tenth pass through the checker loop, so that these messages
> don't clog up the logs so bad. But since there is the option to set the
> verbosity to 1 to avoid this, I think that printing the path down
> messages repeatedly is useful.
> 
> -Ben
> 
> >
> > Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au>
> >
> > --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:57.248455000 -0400
> > +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:12.968639000 -0400
> > @@ -1041,7 +1041,6 @@
> >  					enable_group(pp);
> >  			}
> >  			else if (newstate == PATH_UP || newstate == PATH_GHOST) {
> > -				LOG_MSG(4, checker_message(&pp->checker));
> >  				/*
> >  				 * double the next check delay.
> >  				 * max at conf->max_checkint
> > @@ -1055,8 +1054,6 @@
> >  				condlog(4, "%s: delay next check %is",
> >  						pp->dev_t, pp->tick);
> >  			}
> > -			else if (newstate == PATH_DOWN)
> > -				LOG_MSG(2, checker_message(&pp->checker));
> >
> >  			pp->state = newstate;
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Konrad Rzeszutek June 27, 2009, 1:39 a.m. UTC | #3
On Fri, Jun 26, 2009 at 03:50:57PM -0500, Benjamin Marzinski wrote:
> On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote:
> >
> > multipathd logs a path checker message whenever state changes. But it also 
> > logs every non-changed message when verbosity is >=4 and newstate is 
> > PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 
> > and newstate is PATH_DOWN.
> >
> > I believe that the message should only be logged, once, when state changes.
> 
> I disagree.  If you look at the stuff that gets logged when verbosity >=
> 4, you'll see that it's a whole bunch of stuff that normal users would
> never care about, but may be useful for debugging.  This clearly fits.
> 
> I also think that having a path down is something that shouldn't just
> blip by once in the logs.  It's something that often requires the
> sysadmin to intervene. I could see limitting it to, say, only happen
> on every tenth pass through the checker loop, so that these messages
> don't clog up the logs so bad. But since there is the option to set the
> verbosity to 1 to avoid this, I think that printing the path down
> messages repeatedly is useful.

Hmm.. What if you have 100 LUNs with four paths, meaning you get 400
messages saying: "Down! Down! Pants on fire!"

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke June 29, 2009, 6:19 a.m. UTC | #4
Benjamin Marzinski wrote:
> On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote:
>> multipathd logs a path checker message whenever state changes. But it also 
>> logs every non-changed message when verbosity is >=4 and newstate is 
>> PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 
>> and newstate is PATH_DOWN.
>>
>> I believe that the message should only be logged, once, when state changes.
> 
> I disagree.  If you look at the stuff that gets logged when verbosity >=
> 4, you'll see that it's a whole bunch of stuff that normal users would
> never care about, but may be useful for debugging.  This clearly fits.
> 
> I also think that having a path down is something that shouldn't just
> blip by once in the logs.  It's something that often requires the
> sysadmin to intervene. I could see limitting it to, say, only happen
> on every tenth pass through the checker loop, so that these messages
> don't clog up the logs so bad. But since there is the option to set the
> verbosity to 1 to avoid this, I think that printing the path down
> messages repeatedly is useful.
> 
Agreed strongly. There is nothing more annoying than having 'one-sided'
messages, like printing a path down message _without_ any accompanying
path up message.

That sort of thing always messes up debugging.

Cheers,

Hannes
Charlie Brady July 3, 2009, 4:46 p.m. UTC | #5
On Thu, 25 Jun 2009, Charlie Brady wrote:

> multipathd logs a path checker message whenever state changes. But it also 
> logs every non-changed message when verbosity is >=4 and newstate is PATH_UP 
> or PATH_GHOST, and every non-changed message when verbosity is >= 2 and 
> newstate is PATH_DOWN.
>
> I believe that the message should only be logged, once, when state changes.

I have a mystery related to state change logging which I hope somebody can 
help me solve.

I am running multipathd with the following patch applied (as well as the 
patch from 
https://www.redhat.com/archives/dm-devel/2009-June/msg00293.html which 
updates that checker message with each poll):

> Signed-off-by: Charlie Brady <charlieb@budge.apana.org.au>
>
> --- multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 
> 13:32:57.248455000 -0400
> +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c 
> 2009-06-25 13:32:12.968639000 -0400
> @@ -1041,7 +1041,6 @@
> 			 		enable_group(pp);
> 			 }
> 			 else if (newstate == PATH_UP || newstate ==
> 			 PATH_GHOST) {
> -				LOG_MSG(4, checker_message(&pp->checker));
> 				 /*
> 				  * double the next check delay.
> 				  * max at conf->max_checkint
> @@ -1055,8 +1054,6 @@
> 				 condlog(4, "%s: delay next check %is",
> 			 			pp->dev_t, pp->tick);
> 			}
> -			else if (newstate == PATH_DOWN)
> -				LOG_MSG(2, checker_message(&pp->checker));
>
> 			pp->state = newstate;

Therefore, the only logging of checker_message() remaining is this:

...
                         if (newstate != pp->state) {
                                 int oldstate = pp->state;
                                 pp->state = newstate;
                                 LOG_MSG(1, checker_message(&pp->checker));
...

What I am seeing however is this:

[root@sun4150node2 ~]# grep 'rdac checker' /var/log/messages | tail -10
Jul  3 12:29:44 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:49 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:54 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:59 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:04 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:09 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:14 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:19 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:24 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:29 sun4150node2 multipathd: sdc: rdac checker reports path is up
[root@sun4150node2 ~]#

So I am seeing repeated logging of 'up' state messages, although I only 
expect to see one message per state change.

As further background, I have four iscsi paths configured to two ports on 
two controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management 
software has also configured a non-working IPV6 session).

[root@sun4150node2 ~]# iscsiadm -m node -P 1
Target: iqn.1986-03.com.sun:2510.600a0b800049afbd000000004a36468f
         Portal: 10.37.13.101:3260,1
                 Iface Name: default
         Portal: [fe80:0000:0000:0000:02a0:b8ff:fe4a:1c6d]:3260,1
                 Iface Name: default
         Portal: 10.37.13.104:3260,2
                 Iface Name: default
         Portal: 10.37.13.103:3260,2
                 Iface Name: default
         Portal: 10.37.13.102:3260,1
                 Iface Name: default
[root@sun4150node2 ~]#

Two of the 2510's port connections are disconnected (10.37.13.103 and 
10.37.13.104).

The path checker apparently finds the 8:32 path as intermittent:

[root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XXXXXXXX.. 4/5
11:0:0:0 sde 8:64  100 [failed][faulty] XXXXXXXX.. 4/5
9:0:0:0  sdb 8:16  0   [active][ready]  XXXXXX.... 12/20
12:0:0:0 sdc 8:32  0   [failed][faulty] XXXXXX.... 3/5
multipathd> [root@sun4150node2 ~]#
[root@sun4150node2 ~]#
[root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XX........ 1/5
11:0:0:0 sde 8:64  100 [failed][faulty] XX........ 1/5
9:0:0:0  sdb 8:16  0   [active][ready]  XX........ 4/20
12:0:0:0 sdc 8:32  0   [active][ready]  XXXXXXXXXX 5/5
multipathd> [root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XXXXXX.... 3/5
11:0:0:0 sde 8:64  100 [failed][faulty] XXXXXX.... 3/5
9:0:0:0  sdb 8:16  0   [active][ready]  XXXXXXXX.. 16/20
12:0:0:0 sdc 8:32  0   [failed][faulty] XXXX...... 2/5
multipathd> [root@sun4150node2 ~]#

Here is a longer excerpt from syslog:

Jul  3 12:33:41 sun4150node2 iscsid: connect failed (113)
Jul  3 12:33:44 sun4150node2 multipathd: sdc: rdac checker reports path is 
up
Jul  3 12:33:44 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: remaining active paths: 2
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:44 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:44 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul  3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul  3 12:33:47 sun4150node2 kernel: device-mapper: multipath: Failing 
path 8:32.
Jul  3 12:33:47 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:47 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:47 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:33:47 sun4150node2 multipathd: mpath0: remaining active paths: 1
Jul  3 12:33:47 sun4150node2 iscsid: connect failed (113)
Jul  3 12:33:47 sun4150node2 iscsid: connect failed (113)
Jul  3 12:33:49 sun4150node2 multipathd: sdc: rdac checker reports path is 
up
Jul  3 12:33:49 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: remaining active paths: 2
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:49 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:49 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:50 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:50 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:50 sun4150node2 kernel: device-mapper: multipath: Failing 
path 8:32.
Jul  3 12:33:50 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul  3 12:33:50 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:33:50 sun4150node2 multipathd: mpath0: remaining active paths: 1

I'd be most grateful if someone can help me make sense of this. And 
preferably help me find and fix the bug.

multipath-tools version is 0.4.7.rhel5.13.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady July 3, 2009, 5:07 p.m. UTC | #6
On Fri, 3 Jul 2009, Charlie Brady wrote:

> Jul  3 12:29:59 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:04 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:09 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:14 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:19 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:24 sun4150node2 multipathd: sdc: rdac checker reports path is up
> Jul  3 12:30:29 sun4150node2 multipathd: sdc: rdac checker reports path is up
> [root@sun4150node2 ~]#

This is the cycle of multipathd failing and reinstating the device:

[root@sun4150node2 ~]# grep 8:32 /var/log/messages | tail -12
Jul  3 12:57:38 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:57:38 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:57:39 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:57:43 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:57:44 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:57:44 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:57:48 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:57:50 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:57:50 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:57:53 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:57:53 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:57:54 sun4150node2 multipathd: 8:32: mark as failed
[root@sun4150node2 ~]#

> As further background, I have four iscsi paths configured to two ports on two 
> controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management 
> software has also configured a non-working IPV6 session).

On a related issue, is there any way to learn which of the iscsi paths 
is associated with which device node (/dev/sdX)?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady July 3, 2009, 5:14 p.m. UTC | #7
On Fri, 3 Jul 2009, Charlie Brady wrote:

> As further background, I have four iscsi paths configured to two ports on two 
> controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management 
> software has also configured a non-working IPV6 session).
...
> The path checker apparently finds the 8:32 path as intermittent:

I have found these logs:

Jul  2 15:33:29 sun4150node2 kernel: sd 9:0:0:0: rdac: LUN 0 (unowned)
Jul  3 10:07:34 sun4150node2 kernel: sd 9:0:0:0: queueing MODE_SELECT command.
Jul  3 10:07:34 sun4150node2 kernel: sd 9:0:0:0: MODE_SELECT failed with sense 0x62904.
Jul  3 10:07:37 sun4150node2 kernel: sd 9:0:0:0: queueing MODE_SELECT command.

and wonder whether this thread (and patch) is relevant:

http://kerneltrap.org/mailarchive/linux-scsi/2008/10/30/3873414

I'll apply the patch, build new scsi_dh_rdac module, reboot and see what 
the state is.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady July 3, 2009, 5:23 p.m. UTC | #8
In case it is relevant, here is my device configuration.

devices {
     device {
         vendor                  "SUN"
         product                 "LCSM100_I"
         getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
         prio_callout            "/sbin/mpath_prio_rdac /dev/%n"
         features                "0"
         hardware_handler        "1 rdac"
         path_grouping_policy    group_by_prio
         failback                immediate
         rr_weight               uniform
         no_path_retry           queue
         rr_min_io               1000
         path_checker            rdac
     }
}




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Charlie Brady July 3, 2009, 7:39 p.m. UTC | #9
On Fri, 26 Jun 2009, Konrad Rzeszutek wrote:

> On Fri, Jun 26, 2009 at 03:50:57PM -0500, Benjamin Marzinski wrote:
>> On Thu, Jun 25, 2009 at 01:50:51PM -0400, Charlie Brady wrote:
>>>
>>> multipathd logs a path checker message whenever state changes. But it also
>>> logs every non-changed message when verbosity is >=4 and newstate is
>>> PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2
>>> and newstate is PATH_DOWN.
>>>
>>> I believe that the message should only be logged, once, when state changes.
>>
>> I disagree.  If you look at the stuff that gets logged when verbosity >=
>> 4, you'll see that it's a whole bunch of stuff that normal users would
>> never care about, but may be useful for debugging.  This clearly fits.
>>
>> I also think that having a path down is something that shouldn't just
>> blip by once in the logs.  It's something that often requires the
>> sysadmin to intervene. I could see limitting it to, say, only happen
>> on every tenth pass through the checker loop, so that these messages
>> don't clog up the logs so bad. But since there is the option to set the
>> verbosity to 1 to avoid this, I think that printing the path down
>> messages repeatedly is useful.
>
> Hmm.. What if you have 100 LUNs with four paths, meaning you get 400
> messages saying: "Down! Down! Pants on fire!"

Or, in real life, three LUNS with 16paths each:

http://groups.google.am/group/open-iscsi/browse_thread/thread/95865e5da58a28f2

FWIW, I don't think there is any practical value in posting repeated "path 
down" messages to syslog, and it's wrong to do that at default verbosity 
level.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

--- multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:57.248455000 -0400
+++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c	2009-06-25 13:32:12.968639000 -0400
@@ -1041,7 +1041,6 @@ 
  					enable_group(pp);
  			}
  			else if (newstate == PATH_UP || newstate == PATH_GHOST) {
-				LOG_MSG(4, checker_message(&pp->checker));
  				/*
  				 * double the next check delay.
  				 * max at conf->max_checkint
@@ -1055,8 +1054,6 @@ 
  				condlog(4, "%s: delay next check %is",
  						pp->dev_t, pp->tick);
  			}
-			else if (newstate == PATH_DOWN)
-				LOG_MSG(2, checker_message(&pp->checker));

  			pp->state = newstate;