Message ID | alpine.DEB.2.00.1308070822180.3006@cobra.newdream.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> Hi James, > > Here is a somewhat simpler patch; does this work for you? Note that if > you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in > ceph.conf then you get a status 1 instead of 3. But for the > /etc/init.d/ceph status mds (or osd or mon) case where there are no > daemons of a particular type it works. > > Perhaps the "does not exist" check should be also modified to return 3? > Pacemaker will call the RA on every node to see what is running. On a node in an asymmetric cluster where ceph isn't configured, the RA just wants to know that it isn't running - it won't like an error being returned. For a node without even the RA script installed it would return not-installed, but I think that's okay too. Do you think maybe the 'ceph status' check and the RA check have conflicting requirements here? Maybe it would be better to leave the init.d script as-is and build the smarts into the RA script instead. Do idle mds's add any load to the system? Would it be useful to be able to have pacemaker bring up mds's on any two nodes so you always have exactly two running, without actually tying them to specific nodes? James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 7 Aug 2013, James Harper wrote: > > Hi James, > > > > Here is a somewhat simpler patch; does this work for you? Note that if > > you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in > > ceph.conf then you get a status 1 instead of 3. But for the > > /etc/init.d/ceph status mds (or osd or mon) case where there are no > > daemons of a particular type it works. > > > > Perhaps the "does not exist" check should be also modified to return 3? > > > > Pacemaker will call the RA on every node to see what is running. On a node in an asymmetric cluster where ceph isn't configured, the RA just wants to know that it isn't running - it won't like an error being returned. For a node without even the RA script installed it would return not-installed, but I think that's okay too. > > Do you think maybe the 'ceph status' check and the RA check have conflicting requirements here? Maybe it would be better to leave the init.d script as-is and build the smarts into the RA script instead. Maybe, but I'm not completely following what the RA's requirements are here. If it's just a matter of the init script returning a different error code, though (as we've done so far), I don't see any problem. > Do idle mds's add any load to the system? Would it be useful to be able to have pacemaker bring up mds's on any two nodes so you always have exactly two running, without actually tying them to specific nodes? They don't add much load when they are standby, but they will if they end up taking over. There is also no reason to say 'exactly two' IMO. If you have a symmetric cluster I would be more inclined to run one on every node for simplicity, recognizing that the active one will use some resources. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 7 Aug 2013, James Harper wrote: > > > Hi James, > > > > > > Here is a somewhat simpler patch; does this work for you? Note that if > > > you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in > > > ceph.conf then you get a status 1 instead of 3. But for the > > > /etc/init.d/ceph status mds (or osd or mon) case where there are no > > > daemons of a particular type it works. > > > > > > Perhaps the "does not exist" check should be also modified to return 3? > > > > > > > Pacemaker will call the RA on every node to see what is running. On a node > in an asymmetric cluster where ceph isn't configured, the RA just wants to > know that it isn't running - it won't like an error being returned. For a node > without even the RA script installed it would return not-installed, but I think > that's okay too. > > > > Do you think maybe the 'ceph status' check and the RA check have > conflicting requirements here? Maybe it would be better to leave the init.d > script as-is and build the smarts into the RA script instead. > > Maybe, but I'm not completely following what the RA's requirements are > here. If it's just a matter of the init script returning a different > error code, though (as we've done so far), I don't see any problem. > I haven't tried your patch yet, but can it ever return 0? It seems to set it to 3 initially, and then change it to 1 if it finds an error. I can't see that it ever sets it to 0 indicating that daemons are running. Easy enough to fix by setting the EXIT_STATUS=0 after the check of daemon_is_running, I think, but it still doesn't allow for the case where there are three OSD's, one is running, one is stopped, and one is failed. The EXIT_STATUS in that case appears to be based on the last daemon checked, eg basically random. > > Do idle mds's add any load to the system? Would it be useful to be able to > > have pacemaker bring up mds's on any two nodes so you always have exactly > > two running, without actually tying them to specific nodes? > > They don't add much load when they are standby, but they will if they end > up taking over. There is also no reason to say 'exactly two' IMO. If you > have a symmetric cluster I would be more inclined to run one on every node > for simplicity, recognizing that the active one will use some resources. > Thanks for that clarification James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 9 Aug 2013, James Harper wrote: > > On Wed, 7 Aug 2013, James Harper wrote: > > > > Hi James, > > > > > > > > Here is a somewhat simpler patch; does this work for you? Note that if > > > > you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in > > > > ceph.conf then you get a status 1 instead of 3. But for the > > > > /etc/init.d/ceph status mds (or osd or mon) case where there are no > > > > daemons of a particular type it works. > > > > > > > > Perhaps the "does not exist" check should be also modified to return 3? > > > > > > > > > > Pacemaker will call the RA on every node to see what is running. On a node > > in an asymmetric cluster where ceph isn't configured, the RA just wants to > > know that it isn't running - it won't like an error being returned. For a node > > without even the RA script installed it would return not-installed, but I think > > that's okay too. > > > > > > Do you think maybe the 'ceph status' check and the RA check have > > conflicting requirements here? Maybe it would be better to leave the init.d > > script as-is and build the smarts into the RA script instead. > > > > Maybe, but I'm not completely following what the RA's requirements are > > here. If it's just a matter of the init script returning a different > > error code, though (as we've done so far), I don't see any problem. > > > > I haven't tried your patch yet, but can it ever return 0? It seems to > set it to 3 initially, and then change it to 1 if it finds an error. I > can't see that it ever sets it to 0 indicating that daemons are running. > Easy enough to fix by setting the EXIT_STATUS=0 after the check of > daemon_is_running, I think, but it still doesn't allow for the case > where there are three OSD's, one is running, one is stopped, and one is > failed. The EXIT_STATUS in that case appears to be based on the last > daemon checked, eg basically random. What should it return in that case? sage > > > > Do idle mds's add any load to the system? Would it be useful to be > > > able to have pacemaker bring up mds's on any two nodes so you always > > > have exactly two running, without actually tying them to specific > > > nodes? > > > > They don't add much load when they are standby, but they will if they end > > up taking over. There is also no reason to say 'exactly two' IMO. If you > > have a symmetric cluster I would be more inclined to run one on every node > > for simplicity, recognizing that the active one will use some resources. > > > > Thanks for that clarification > > James > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > I haven't tried your patch yet, but can it ever return 0? It seems to > > set it to 3 initially, and then change it to 1 if it finds an error. I > > can't see that it ever sets it to 0 indicating that daemons are running. > > Easy enough to fix by setting the EXIT_STATUS=0 after the check of > > daemon_is_running, I think, but it still doesn't allow for the case > > where there are three OSD's, one is running, one is stopped, and one is > > failed. The EXIT_STATUS in that case appears to be based on the last > > daemon checked, eg basically random. > > What should it return in that case? > I've been thinking about this some more and I'm still not sure. I think my patch says: if _any_ are in error then return 1 else if any are running return 0 else if all are stopped return 3 But I think this still won't have the desired outcome if you have 2 OSD's. The possible situations if the resource is supposed to be running are: . Both running => all good, pacemaker will do nothing . Both stopped => all good, pacemaker will start the services . One stopped one running => not good, pacemaker won't make any effort to start services . One in error, one running => not good. I'm not sure exactly what will happen but it won't be what you expect. The only solution I can see is to manage the services individually, in which case the init.d script with your patch + setting to 0 if running does the right thing anyway. James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 9 Aug 2013, James Harper wrote: > > > I haven't tried your patch yet, but can it ever return 0? It seems to > > > set it to 3 initially, and then change it to 1 if it finds an error. I > > > can't see that it ever sets it to 0 indicating that daemons are running. > > > Easy enough to fix by setting the EXIT_STATUS=0 after the check of > > > daemon_is_running, I think, but it still doesn't allow for the case > > > where there are three OSD's, one is running, one is stopped, and one is > > > failed. The EXIT_STATUS in that case appears to be based on the last > > > daemon checked, eg basically random. > > > > What should it return in that case? > > > > I've been thinking about this some more and I'm still not sure. I think my patch says: > if _any_ are in error then return 1 > else if any are running return 0 > else if all are stopped return 3 > > But I think this still won't have the desired outcome if you have 2 OSD's. The possible situations if the resource is supposed to be running are: > . Both running => all good, pacemaker will do nothing > . Both stopped => all good, pacemaker will start the services > . One stopped one running => not good, pacemaker won't make any effort to start services If one daemon si stopped and one is running, returning 'not running' seems ok to me, since 'start' at that point will do the right thing. > . One in error, one running => not good. I'm not sure exactly what will happen but it won't be what you expect. I think it's fine for this to be an error condition. > > The only solution I can see is to manage the services individually, in > which case the init.d script with your patch + setting to 0 if running > does the right thing anyway. Yeah, managing individually is probably the most robust, but if it works well enough in the generic configuration with no customization that is good. Anyway, I'm fine with whatever variation of your original or my patch you think addresses this. A comment block in the init-ceph script documenting what the return codes mean (similar to the above) would be nice so that it is clear to the next person who comes along. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > But I think this still won't have the desired outcome if you have 2 OSD's. > > The possible situations if the resource is supposed to be running are: > > . Both running => all good, pacemaker will do nothing > > . Both stopped => all good, pacemaker will start the services > > . One stopped one running => not good, pacemaker won't make any effort > > to start services > > If one daemon si stopped and one is running, returning 'not running' seems > ok to me, since 'start' at that point will do the right thing. Maybe. If the stopped daemon is stopped because it fails to start then pacemaker might get unhappy when subsequent starts also fail, and might even get STONITHy. > > . One in error, one running => not good. I'm not sure exactly what will > > happen but it won't be what you expect. > > I think it's fine for this to be an error condition. Again. If pacemaker see's the error it might start doing things you don't want. Technically, for actual clustered resources, returning "not running" when something is running is about the worst thing you can do because pacemaker might then start up the resource on another node (eg start a VM on two nodes at once, corrupting the fs). The way you'd set this up for ceph though is just a cloned resource on each node so it wouldn't matter anyway. > > > > The only solution I can see is to manage the services individually, in > > which case the init.d script with your patch + setting to 0 if running > > does the right thing anyway. > > Yeah, managing individually is probably the most robust, but if it works > well enough in the generic configuration with no customization that is > good. Actually it subsequently occurred to me that if I set them up individually then my dependencies will break (eg start ceph before mounting ceph-fs) because there are now different ceph instances per node. > > Anyway, I'm fine with whatever variation of your original or my patch you > think addresses this. A comment block in the init-ceph script documenting > what the return codes mean (similar to the above) would be nice so that > it is clear to the next person who comes along. > I might post on the pacemaker list and see what the thoughts are there. Maybe it would be better for me to just re-order the init.d scripts so ceph starts in init.d and leave it at that... James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 9 Aug 2013, James Harper wrote: > > > But I think this still won't have the desired outcome if you have 2 OSD's. > > > The possible situations if the resource is supposed to be running are: > > > . Both running => all good, pacemaker will do nothing > > > . Both stopped => all good, pacemaker will start the services > > > . One stopped one running => not good, pacemaker won't make any effort > > > to start services > > > > If one daemon si stopped and one is running, returning 'not running' seems > > ok to me, since 'start' at that point will do the right thing. > > Maybe. If the stopped daemon is stopped because it fails to start then pacemaker might get unhappy when subsequent starts also fail, and might even get STONITHy. This is sounding more like we're trying to fit a square peg in a round hole. Generally speaking there is *never* any need for anything that resembles STONITH with Ceph; all of that is handled internally by Ceph itself. I think the only real reason why you would want to use pacemaker here is if you just like it better than the normal startup scripts, or perhaps because you are using it to control where the standby mdss run. So maybe we are barking up the wrong tree... sage > > > > . One in error, one running => not good. I'm not sure exactly what will > > > happen but it won't be what you expect. > > > > I think it's fine for this to be an error condition. > > Again. If pacemaker see's the error it might start doing things you don't want. > > Technically, for actual clustered resources, returning "not running" when something is running is about the worst thing you can do because pacemaker might then start up the resource on another node (eg start a VM on two nodes at once, corrupting the fs). The way you'd set this up for ceph though is just a cloned resource on each node so it wouldn't matter anyway. > > > > > > > The only solution I can see is to manage the services individually, in > > > which case the init.d script with your patch + setting to 0 if running > > > does the right thing anyway. > > > > Yeah, managing individually is probably the most robust, but if it works > > well enough in the generic configuration with no customization that is > > good. > > Actually it subsequently occurred to me that if I set them up individually then my dependencies will break (eg start ceph before mounting ceph-fs) because there are now different ceph instances per node. > > > > > Anyway, I'm fine with whatever variation of your original or my patch you > > think addresses this. A comment block in the init-ceph script documenting > > what the return codes mean (similar to the above) would be nice so that > > it is clear to the next person who comes along. > > > > I might post on the pacemaker list and see what the thoughts are there. > > Maybe it would be better for me to just re-order the init.d scripts so ceph starts in init.d and leave it at that... > > James > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/init-ceph.in b/src/init-ceph.in index 8eb02f8..be5565c 100644 --- a/src/init-ceph.in +++ b/src/init-ceph.in @@ -165,6 +165,12 @@ verify_conf command=$1 [ -n "$*" ] && shift +if [ "$command" = "status" ]; then + # nothing defined for this host => not running; we'll use this if we + # don't check anything below. + EXIT_STATUS=3 +fi + get_local_name_list get_name_list "$@"