diff mbox

[2/2] IB/ipoib: Suppress warning for send only join failures

Message ID 1440200053-18890-2-git-send-email-jgunthorpe@obsidianresearch.com (mailing list archive)
State Accepted
Headers show

Commit Message

Jason Gunthorpe Aug. 21, 2015, 11:34 p.m. UTC
We expect send only joins to fail, it just means there are no listeners
for the group. The correct thing to do is silently drop the packet
at source.

Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
to 224.0.0.22, and then a warning level kmessage like this:

 ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22

If there is no IP router listening to IGMP.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Hal Rosenstock Aug. 25, 2015, 12:59 p.m. UTC | #1
On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> We expect send only joins to fail, it just means there are no listeners
> for the group. The correct thing to do is silently drop the packet
> at source.
> 
> Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
> to 224.0.0.22, and then a warning level kmessage like this:
> 
>  ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
> 
> If there is no IP router listening to IGMP.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index c0e702c577d5..2d43ec542b63 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status,
>  			goto out_locked;
>  		}
>  	} else {
> -		if (mcast->logcount++ < 20) {
> -			if (status == -ETIMEDOUT || status == -EAGAIN) {
> +		bool silent_fail =
> +		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> +		    status == -EINVAL;

Aren't there other reasons that send only join might have EINVAL
indicated ? Maybe it's better to be overly silent rather than overly
verbose as to not spam the log but it seems like it would make debug of
such cases harder.

> +
> +		if (mcast->logcount < 20) {
> +			if (status == -ETIMEDOUT || status == -EAGAIN ||
> +			    silent_fail) {
>  				ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n",
>  						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>  						mcast->mcmember.mgid.raw, status);

ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG

> @@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status,
>  						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>  					   mcast->mcmember.mgid.raw, status);
>  			}
> +
> +			if (!silent_fail)
> +				mcast->logcount++;
>  		}
>  
>  		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 25, 2015, 4:28 p.m. UTC | #2
On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
> > -		if (mcast->logcount++ < 20) {
> > -			if (status == -ETIMEDOUT || status == -EAGAIN) {
> > +		bool silent_fail =
> > +		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> > +		    status == -EINVAL;
> 
> Aren't there other reasons that send only join might have EINVAL
> indicated ?

Not sure, the layers below all eat the detailed error code. Hopefully
EINVAL isn't re-used.

> Maybe it's better to be overly silent rather than overly
> verbose as to not spam the log but it seems like it would make debug of
> such cases harder.

It makes debugging harder to have worthless messages because they
obscure what is going on. The first time I saw this I assumed there
was an issue, but it turns out to be an expected failure.

The other issue is the way the rate limiting works:

> > +		if (mcast->logcount < 20) {
> > +			if (status == -ETIMEDOUT || status == -EAGAIN ||
> > +			    silent_fail) {
> >  				ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n",
> >  						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
> >  						mcast->mcmember.mgid.raw, status);

So wasting logcount with expected failures just results in eating
unexpected failures...

> ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG

Most distros turn this off so the change only impacts people trying to
debug this stuff.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Aug. 26, 2015, 9:41 a.m. UTC | #3
On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
> On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
>>> -		if (mcast->logcount++ < 20) {
>>> -			if (status == -ETIMEDOUT || status == -EAGAIN) {
>>> +		bool silent_fail =
>>> +		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
>>> +		    status == -EINVAL;
>>
>> Aren't there other reasons that send only join might have EINVAL
>> indicated ?
> 
> Not sure, the layers below all eat the detailed error code. Hopefully
> EINVAL isn't re-used.

AFAIR there are a number of reasons EINVAL could occur here in which
case this makes this change overly silent. If so, this particular
failure case of send only join failure due to SM rejection (perhaps
ERR_REQ_INVALID SA status only) is best to be made unique and different
from the other current EINVAL failures here.

> 
>> Maybe it's better to be overly silent rather than overly
>> verbose as to not spam the log but it seems like it would make debug of
>> such cases harder.
> 
> It makes debugging harder to have worthless messages because they
> obscure what is going on. The first time I saw this I assumed there
> was an issue, but it turns out to be an expected failure.
> 
> The other issue is the way the rate limiting works:
> 
>>> +		if (mcast->logcount < 20) {
>>> +			if (status == -ETIMEDOUT || status == -EAGAIN ||
>>> +			    silent_fail) {
>>>  				ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n",
>>>  						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>>>  						mcast->mcmember.mgid.raw, status);
> 
> So wasting logcount with expected failures just results in eating
> unexpected failures...

Yes, the problem is distinguishing an "expected" failure from the real
ones and only logging the real ones.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 27, 2015, 11:34 p.m. UTC | #4
On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote:
> On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
> > On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
> >>> -		if (mcast->logcount++ < 20) {
> >>> -			if (status == -ETIMEDOUT || status == -EAGAIN) {
> >>> +		bool silent_fail =
> >>> +		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> >>> +		    status == -EINVAL;
> >>
> >> Aren't there other reasons that send only join might have EINVAL
> >> indicated ?
> > 
> > Not sure, the layers below all eat the detailed error code. Hopefully
> > EINVAL isn't re-used.
> 
> AFAIR there are a number of reasons EINVAL could occur here in which
> case this makes this change overly silent. If so, this particular
> failure case of send only join failure due to SM rejection (perhaps
> ERR_REQ_INVALID SA status only) is best to be made unique and different
> from the other current EINVAL failures here.

That is way to much to undertake just to silence this message.

Unless you know the other EINVALs are likely to happen, I'd just
ignore this imperfection.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Aug. 28, 2015, 1:58 p.m. UTC | #5
On 8/27/2015 7:34 PM, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote:
>> On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
>>> On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
>>>>> -		if (mcast->logcount++ < 20) {
>>>>> -			if (status == -ETIMEDOUT || status == -EAGAIN) {
>>>>> +		bool silent_fail =
>>>>> +		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
>>>>> +		    status == -EINVAL;
>>>>
>>>> Aren't there other reasons that send only join might have EINVAL
>>>> indicated ?
>>>
>>> Not sure, the layers below all eat the detailed error code. Hopefully
>>> EINVAL isn't re-used.
>>
>> AFAIR there are a number of reasons EINVAL could occur here in which
>> case this makes this change overly silent. If so, this particular
>> failure case of send only join failure due to SM rejection (perhaps
>> ERR_REQ_INVALID SA status only)

I meant ERR_REQ_INSUFFICIENT_COMPONENTS here.

>> is best to be made unique and different
>> from the other current EINVAL failures here.
> 
> That is way to much to undertake just to silence this message.
>
> Unless you know the other EINVALs are likely to happen, I'd just
> ignore this imperfection.

That's probably the only reasonable choice in the short run :-(

-- Hal

> 
> Jason
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Sept. 3, 2015, 9:20 p.m. UTC | #6
On 08/21/2015 07:34 PM, Jason Gunthorpe wrote:
> We expect send only joins to fail, it just means there are no listeners
> for the group. The correct thing to do is silently drop the packet
> at source.
> 
> Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
> to 224.0.0.22, and then a warning level kmessage like this:
> 
>  ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
> 
> If there is no IP router listening to IGMP.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Thanks, applied.
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index c0e702c577d5..2d43ec542b63 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -393,8 +393,13 @@  static int ipoib_mcast_join_complete(int status,
 			goto out_locked;
 		}
 	} else {
-		if (mcast->logcount++ < 20) {
-			if (status == -ETIMEDOUT || status == -EAGAIN) {
+		bool silent_fail =
+		    test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
+		    status == -EINVAL;
+
+		if (mcast->logcount < 20) {
+			if (status == -ETIMEDOUT || status == -EAGAIN ||
+			    silent_fail) {
 				ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n",
 						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
 						mcast->mcmember.mgid.raw, status);
@@ -403,6 +408,9 @@  static int ipoib_mcast_join_complete(int status,
 						test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
 					   mcast->mcmember.mgid.raw, status);
 			}
+
+			if (!silent_fail)
+				mcast->logcount++;
 		}
 
 		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&