Message ID | 1440200053-18890-2-git-send-email-jgunthorpe@obsidianresearch.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On 8/21/2015 7:34 PM, Jason Gunthorpe wrote: > We expect send only joins to fail, it just means there are no listeners > for the group. The correct thing to do is silently drop the packet > at source. > > Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet > to 224.0.0.22, and then a warning level kmessage like this: > > ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 > > If there is no IP router listening to IGMP. > > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > --- > drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > index c0e702c577d5..2d43ec542b63 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > @@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status, > goto out_locked; > } > } else { > - if (mcast->logcount++ < 20) { > - if (status == -ETIMEDOUT || status == -EAGAIN) { > + bool silent_fail = > + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && > + status == -EINVAL; Aren't there other reasons that send only join might have EINVAL indicated ? Maybe it's better to be overly silent rather than overly verbose as to not spam the log but it seems like it would make debug of such cases harder. > + > + if (mcast->logcount < 20) { > + if (status == -ETIMEDOUT || status == -EAGAIN || > + silent_fail) { > ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n", > test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", > mcast->mcmember.mgid.raw, status); ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG > @@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status, > test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", > mcast->mcmember.mgid.raw, status); > } > + > + if (!silent_fail) > + mcast->logcount++; > } > > if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote: > > - if (mcast->logcount++ < 20) { > > - if (status == -ETIMEDOUT || status == -EAGAIN) { > > + bool silent_fail = > > + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && > > + status == -EINVAL; > > Aren't there other reasons that send only join might have EINVAL > indicated ? Not sure, the layers below all eat the detailed error code. Hopefully EINVAL isn't re-used. > Maybe it's better to be overly silent rather than overly > verbose as to not spam the log but it seems like it would make debug of > such cases harder. It makes debugging harder to have worthless messages because they obscure what is going on. The first time I saw this I assumed there was an issue, but it turns out to be an expected failure. The other issue is the way the rate limiting works: > > + if (mcast->logcount < 20) { > > + if (status == -ETIMEDOUT || status == -EAGAIN || > > + silent_fail) { > > ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n", > > test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", > > mcast->mcmember.mgid.raw, status); So wasting logcount with expected failures just results in eating unexpected failures... > ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG Most distros turn this off so the change only impacts people trying to debug this stuff. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/25/2015 12:28 PM, Jason Gunthorpe wrote: > On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote: >>> - if (mcast->logcount++ < 20) { >>> - if (status == -ETIMEDOUT || status == -EAGAIN) { >>> + bool silent_fail = >>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && >>> + status == -EINVAL; >> >> Aren't there other reasons that send only join might have EINVAL >> indicated ? > > Not sure, the layers below all eat the detailed error code. Hopefully > EINVAL isn't re-used. AFAIR there are a number of reasons EINVAL could occur here in which case this makes this change overly silent. If so, this particular failure case of send only join failure due to SM rejection (perhaps ERR_REQ_INVALID SA status only) is best to be made unique and different from the other current EINVAL failures here. > >> Maybe it's better to be overly silent rather than overly >> verbose as to not spam the log but it seems like it would make debug of >> such cases harder. > > It makes debugging harder to have worthless messages because they > obscure what is going on. The first time I saw this I assumed there > was an issue, but it turns out to be an expected failure. > > The other issue is the way the rate limiting works: > >>> + if (mcast->logcount < 20) { >>> + if (status == -ETIMEDOUT || status == -EAGAIN || >>> + silent_fail) { >>> ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n", >>> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", >>> mcast->mcmember.mgid.raw, status); > > So wasting logcount with expected failures just results in eating > unexpected failures... Yes, the problem is distinguishing an "expected" failure from the real ones and only logging the real ones. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote: > On 8/25/2015 12:28 PM, Jason Gunthorpe wrote: > > On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote: > >>> - if (mcast->logcount++ < 20) { > >>> - if (status == -ETIMEDOUT || status == -EAGAIN) { > >>> + bool silent_fail = > >>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && > >>> + status == -EINVAL; > >> > >> Aren't there other reasons that send only join might have EINVAL > >> indicated ? > > > > Not sure, the layers below all eat the detailed error code. Hopefully > > EINVAL isn't re-used. > > AFAIR there are a number of reasons EINVAL could occur here in which > case this makes this change overly silent. If so, this particular > failure case of send only join failure due to SM rejection (perhaps > ERR_REQ_INVALID SA status only) is best to be made unique and different > from the other current EINVAL failures here. That is way to much to undertake just to silence this message. Unless you know the other EINVALs are likely to happen, I'd just ignore this imperfection. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/27/2015 7:34 PM, Jason Gunthorpe wrote: > On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote: >> On 8/25/2015 12:28 PM, Jason Gunthorpe wrote: >>> On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote: >>>>> - if (mcast->logcount++ < 20) { >>>>> - if (status == -ETIMEDOUT || status == -EAGAIN) { >>>>> + bool silent_fail = >>>>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && >>>>> + status == -EINVAL; >>>> >>>> Aren't there other reasons that send only join might have EINVAL >>>> indicated ? >>> >>> Not sure, the layers below all eat the detailed error code. Hopefully >>> EINVAL isn't re-used. >> >> AFAIR there are a number of reasons EINVAL could occur here in which >> case this makes this change overly silent. If so, this particular >> failure case of send only join failure due to SM rejection (perhaps >> ERR_REQ_INVALID SA status only) I meant ERR_REQ_INSUFFICIENT_COMPONENTS here. >> is best to be made unique and different >> from the other current EINVAL failures here. > > That is way to much to undertake just to silence this message. > > Unless you know the other EINVALs are likely to happen, I'd just > ignore this imperfection. That's probably the only reasonable choice in the short run :-( -- Hal > > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/21/2015 07:34 PM, Jason Gunthorpe wrote: > We expect send only joins to fail, it just means there are no listeners > for the group. The correct thing to do is silently drop the packet > at source. > > Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet > to 224.0.0.22, and then a warning level kmessage like this: > > ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 > > If there is no IP router listening to IGMP. > > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Thanks, applied.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index c0e702c577d5..2d43ec542b63 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status, goto out_locked; } } else { - if (mcast->logcount++ < 20) { - if (status == -ETIMEDOUT || status == -EAGAIN) { + bool silent_fail = + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && + status == -EINVAL; + + if (mcast->logcount < 20) { + if (status == -ETIMEDOUT || status == -EAGAIN || + silent_fail) { ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n", test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", mcast->mcmember.mgid.raw, status); @@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status, test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "", mcast->mcmember.mgid.raw, status); } + + if (!silent_fail) + mcast->logcount++; } if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
We expect send only joins to fail, it just means there are no listeners for the group. The correct thing to do is silently drop the packet at source. Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet to 224.0.0.22, and then a warning level kmessage like this: ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 If there is no IP router listening to IGMP. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)