diff mbox

[rdma-next,V1,1/9] IB/ipoib: Add warning message when changing the MTU in UD over the max range

Message ID 20161228124728.26619-2-leon@kernel.org (mailing list archive)
State Accepted
Headers show

Commit Message

Leon Romanovsky Dec. 28, 2016, 12:47 p.m. UTC
From: Feras Daoud <ferasda@mellanox.com>

In datagram mode, the IB UD (Unreliable Datagram) transport is used
so the MTU of the interface is equal to the IB L2 MTU minus the
IPoIB encapsulation header. Any request to change the MTU value
above the maximum range will change the MTU to the max allowed, but
will not show any warning message. An ipoib_warn is issued in such
cases, letting the user know that even though the value is legal,
it can't be currently applied.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Doug Ledford Jan. 12, 2017, 6:46 p.m. UTC | #1
On Wed, 2016-12-28 at 14:47 +0200, Leon Romanovsky wrote:
> From: Feras Daoud <ferasda@mellanox.com>
> 
> In datagram mode, the IB UD (Unreliable Datagram) transport is used
> so the MTU of the interface is equal to the IB L2 MTU minus the
> IPoIB encapsulation header. Any request to change the MTU value
> above the maximum range will change the MTU to the max allowed, but
> will not show any warning message. An ipoib_warn is issued in such
> cases, letting the user know that even though the value is legal,
> it can't be currently applied.
> 
> Signed-off-by: Feras Daoud <ferasda@mellanox.com>
> Signed-off-by: Noa Osherovich <noaos@mellanox.com>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 3ce0765..a550cc6 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -229,6 +229,10 @@ static int ipoib_change_mtu(struct net_device
> *dev, int new_mtu)
>  
>  	priv->admin_mtu = new_mtu;
>  
> +	if (priv->mcast_mtu < priv->admin_mtu)
> +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu
> (%u)\n",
> +			   priv->mcast_mtu);
> +
>  	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
>  
>  	return 0;

I don't like this patch.  First, there's no need for a warning here.
 That's entirely too noisy for this issue.  Second, the wording of the
message is poor.  The user thinks they are setting the MTU, and there
is no means of setting a multicast MTU, so telling them that their new
MTU must be less than the mcast MTU that they can't do anything about
and don't necessarily know how it is generated makes no sense.  This
should be no more than ipoib_dbg if we even print anything out at all,
and the message should be more like "MTU must be <= the link layer MTU
- 4, use ibv_devinfo on the RDMA device to get the link layer MTU"
Leon Romanovsky Jan. 12, 2017, 7:35 p.m. UTC | #2
On Thu, Jan 12, 2017 at 01:46:16PM -0500, Doug Ledford wrote:
> On Wed, 2016-12-28 at 14:47 +0200, Leon Romanovsky wrote:
> > From: Feras Daoud <ferasda@mellanox.com>
> >
> > In datagram mode, the IB UD (Unreliable Datagram) transport is used
> > so the MTU of the interface is equal to the IB L2 MTU minus the
> > IPoIB encapsulation header. Any request to change the MTU value
> > above the maximum range will change the MTU to the max allowed, but
> > will not show any warning message. An ipoib_warn is issued in such
> > cases, letting the user know that even though the value is legal,
> > it can't be currently applied.
> >
> > Signed-off-by: Feras Daoud <ferasda@mellanox.com>
> > Signed-off-by: Noa Osherovich <noaos@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leon@kernel.org>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 3ce0765..a550cc6 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -229,6 +229,10 @@ static int ipoib_change_mtu(struct net_device
> > *dev, int new_mtu)
> >  
> >  	priv->admin_mtu = new_mtu;
> >  
> > +	if (priv->mcast_mtu < priv->admin_mtu)
> > +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu
> > (%u)\n",
> > +			   priv->mcast_mtu);
> > +
> >  	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
> >  
> >  	return 0;
>
> I don't like this patch.  First, there's no need for a warning here.
>  That's entirely too noisy for this issue.  Second, the wording of the
> message is poor.  The user thinks they are setting the MTU, and there
> is no means of setting a multicast MTU, so telling them that their new
> MTU must be less than the mcast MTU that they can't do anything about
> and don't necessarily know how it is generated makes no sense.  This
> should be no more than ipoib_dbg if we even print anything out at all,
> and the message should be more like "MTU must be <= the link layer MTU
> - 4, use ibv_devinfo on the RDMA device to get the link layer MTU"

First of all, thank you for fixing wording, for me it is the hardest
part of every commit.

Second, I have a different view from you on the issue. User configured
some value, which is not correct for IPoIB. In ideal world (without legacy),
we were supposed to return error to him with proper message, but in our
case (legacy applications) we can't (we tried and it broke some legacy
ifcongfig, if I remember well). So it leaves us with one available
option is to warn user about improper value.

User should know that he supplied wrong parameter.

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>    
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Jason Gunthorpe Jan. 12, 2017, 8:16 p.m. UTC | #3
On Thu, Jan 12, 2017 at 09:35:08PM +0200, Leon Romanovsky wrote:

> > > In datagram mode, the IB UD (Unreliable Datagram) transport is used
> > > so the MTU of the interface is equal to the IB L2 MTU minus the
> > > IPoIB encapsulation header. Any request to change the MTU value
> > > above the maximum range will change the MTU to the max allowed, but
> > > will not show any warning message. An ipoib_warn is issued in such
> > > cases, letting the user know that even though the value is legal,
> > > it can't be currently applied.

How does RC mode work then?

> Second, I have a different view from you on the issue. User configured
> some value, which is not correct for IPoIB. In ideal world (without legacy),
> we were supposed to return error to him with proper message, but in our
> case (legacy applications) we can't (we tried and it broke some legacy
> ifcongfig, if I remember well). So it leaves us with one available
> option is to warn user about improper value.

This is a legitimate configuration though, the user could have a mixed
MTU fabric and rely on path MTU discovery, so sets the MTU to the
largest.

AFAIK the way to handle such configurations is with a multicast route that
has a reduced mtu.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Jan. 12, 2017, 9:58 p.m. UTC | #4
On Thu, 2017-01-12 at 21:35 +0200, Leon Romanovsky wrote:

> First of all, thank you for fixing wording, for me it is the hardest
> part of every commit.

No worries.

> Second, I have a different view from you on the issue. User
> configured
> some value, which is not correct for IPoIB. In ideal world (without
> legacy),
> we were supposed to return error to him with proper message,

If you set an impossible MTU on an ethernet adapter, you get a return
of EINVAL.  But it doesn't mess with the kernel log at all, it's *just*
EINVAL to the calling program.

>  but in our
> case (legacy applications) we can't (we tried and it broke some
> legacy
> ifcongfig, if I remember well).

ifconfig is what I used to test what Ethernet does, so it should be
well and truly used to EINVAL as a return.

>  So it leaves us with one available
> option is to warn user about improper value.

Even if you want to alert the user, there is no need to warn.  Warn is
reserved for something that is a serious error or condition, this is an
EINVAL of something that is, at best, a performance issue.  Unless path
MTU support is broken, failing to set a larger MTU will not ever hinder
actual operation.  So we should never be dumping out KERN_WARN messages
into the kernel log that will ping up on any root login session as well
as clutter the kernel dmesg output.

> User should know that he supplied wrong parameter.

User can still find out, they just won't get proactive pings on all
root sessions, instead they will have to look in the dmesg output
because they are trying to figure out why their command didn't work.

> > 
> > 
> > --
> > Doug Ledford <dledford@redhat.com>
> >     GPG KeyID: B826A3330E572FDD
> >    
> > Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > 2FDD
> 
>
Leon Romanovsky Jan. 13, 2017, 3:08 p.m. UTC | #5
On Thu, Jan 12, 2017 at 01:16:22PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 12, 2017 at 09:35:08PM +0200, Leon Romanovsky wrote:
>
> > > > In datagram mode, the IB UD (Unreliable Datagram) transport is used
> > > > so the MTU of the interface is equal to the IB L2 MTU minus the
> > > > IPoIB encapsulation header. Any request to change the MTU value
> > > > above the maximum range will change the MTU to the max allowed, but
> > > > will not show any warning message. An ipoib_warn is issued in such
> > > > cases, letting the user know that even though the value is legal,
> > > > it can't be currently applied.
>
> How does RC mode work then?

RC mode doesn't have limitation on MTU size and it allows messages
larger than IB link-layer MTU.

>
> > Second, I have a different view from you on the issue. User configured
> > some value, which is not correct for IPoIB. In ideal world (without legacy),
> > we were supposed to return error to him with proper message, but in our
> > case (legacy applications) we can't (we tried and it broke some legacy
> > ifcongfig, if I remember well). So it leaves us with one available
> > option is to warn user about improper value.
>
> This is a legitimate configuration though, the user could have a mixed
> MTU fabric and rely on path MTU discovery, so sets the MTU to the
> largest.
>
> AFAIK the way to handle such configurations is with a multicast route that
> has a reduced mtu.

Multicast traffic is sent in datagram mode hence the limit will imply.

Maybe, I went too far by calling it error, but IMHO absence of indication to
user of decreased MTU is not good either.

>
> Jason
Leon Romanovsky Jan. 13, 2017, 3:15 p.m. UTC | #6
On Thu, Jan 12, 2017 at 04:58:28PM -0500, Doug Ledford wrote:
> On Thu, 2017-01-12 at 21:35 +0200, Leon Romanovsky wrote:
> > 
> > First of all, thank you for fixing wording, for me it is the hardest
> > part of every commit.
>
> No worries.
>
> > Second, I have a different view from you on the issue. User
> > configured
> > some value, which is not correct for IPoIB. In ideal world (without
> > legacy),
> > we were supposed to return error to him with proper message,
>
> If you set an impossible MTU on an ethernet adapter, you get a return
> of EINVAL.  But it doesn't mess with the kernel log at all, it's *just*
> EINVAL to the calling program.
>
> >  but in our
> > case (legacy applications) we can't (we tried and it broke some
> > legacy
> > ifcongfig, if I remember well).
>
> ifconfig is what I used to test what Ethernet does, so it should be
> well and truly used to EINVAL as a return.

I'll recheck with Feras next week which legacy tool
didn't work and will post it.

>
> >  So it leaves us with one available
> > option is to warn user about improper value.
>
> Even if you want to alert the user, there is no need to warn.  Warn is
> reserved for something that is a serious error or condition, this is an
> EINVAL of something that is, at best, a performance issue.  Unless path
> MTU support is broken, failing to set a larger MTU will not ever hinder
> actual operation.  So we should never be dumping out KERN_WARN messages
> into the kernel log that will ping up on any root login session as well
> as clutter the kernel dmesg output.
>
> > User should know that he supplied wrong parameter.
>
> User can still find out, they just won't get proactive pings on all
> root sessions, instead they will have to look in the dmesg output
> because they are trying to figure out why their command didn't work.
>
> > >
> > >
> > > --
> > > Doug Ledford <dledford@redhat.com>
> > >     GPG KeyID: B826A3330E572FDD
> > >    
> > > Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > > 2FDD
> >
> >
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>    
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Jason Gunthorpe Jan. 13, 2017, 5:12 p.m. UTC | #7
On Fri, Jan 13, 2017 at 05:08:24PM +0200, Leon Romanovsky wrote:
> On Thu, Jan 12, 2017 at 01:16:22PM -0700, Jason Gunthorpe wrote:
> > On Thu, Jan 12, 2017 at 09:35:08PM +0200, Leon Romanovsky wrote:
> >
> > > > > In datagram mode, the IB UD (Unreliable Datagram) transport is used
> > > > > so the MTU of the interface is equal to the IB L2 MTU minus the
> > > > > IPoIB encapsulation header. Any request to change the MTU value
> > > > > above the maximum range will change the MTU to the max allowed, but
> > > > > will not show any warning message. An ipoib_warn is issued in such
> > > > > cases, letting the user know that even though the value is legal,
> > > > > it can't be currently applied.
> >
> > How does RC mode work then?
> 
> RC mode doesn't have limitation on MTU size and it allows messages
> larger than IB link-layer MTU.

This is about what happens if the multicast MTU < unicast MTU - which
is *exactly* the same case on RC and UD.

IPoIB cannot send a 64k multicast MTU on RC either.

> Maybe, I went too far by calling it error, but IMHO absence of indication to
> user of decreased MTU is not good either.

I don't see why we should 'solve' this for UD and ignore the same
problem in RC.

IMHO, the correct fix is to figure out how to limit multicast - there
is something called multicast path MTU discovery - so maybe we are
doing something wrong and breaking that (are we erroring too large
packets inside ipoib properly?), or it is not working in the kernel or
it isn't supported by the kernel ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 13, 2017, 8:31 p.m. UTC | #8
On Fri, Jan 13, 2017 at 10:12:34AM -0700, Jason Gunthorpe wrote:
> On Fri, Jan 13, 2017 at 05:08:24PM +0200, Leon Romanovsky wrote:
> > On Thu, Jan 12, 2017 at 01:16:22PM -0700, Jason Gunthorpe wrote:
> > > On Thu, Jan 12, 2017 at 09:35:08PM +0200, Leon Romanovsky wrote:
> > >
> > > > > > In datagram mode, the IB UD (Unreliable Datagram) transport is used
> > > > > > so the MTU of the interface is equal to the IB L2 MTU minus the
> > > > > > IPoIB encapsulation header. Any request to change the MTU value
> > > > > > above the maximum range will change the MTU to the max allowed, but
> > > > > > will not show any warning message. An ipoib_warn is issued in such
> > > > > > cases, letting the user know that even though the value is legal,
> > > > > > it can't be currently applied.
> > >
> > > How does RC mode work then?
> >
> > RC mode doesn't have limitation on MTU size and it allows messages
> > larger than IB link-layer MTU.
>
> This is about what happens if the multicast MTU < unicast MTU - which
> is *exactly* the same case on RC and UD.
>
> IPoIB cannot send a 64k multicast MTU on RC either.

Multicast is sent in datagram despite being in connected mode and for
datagram the MTU is limited by IB
Jason Gunthorpe Jan. 13, 2017, 9:27 p.m. UTC | #9
On Fri, Jan 13, 2017 at 10:31:27PM +0200, Leon Romanovsky wrote:

> > This is about what happens if the multicast MTU < unicast MTU - which
> > is *exactly* the same case on RC and UD.
> >
> > IPoIB cannot send a 64k multicast MTU on RC either.
> 
> Multicast is sent in datagram despite being in connected mode and for
> datagram the MTU is limited by IB

I am aware of that. Read you patch again:

> +	if (priv->mcast_mtu < priv->admin_mtu)
> +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",

This check is fails in RC mode as well.

And we already check if the requested MTU is beyond the port
capability:

+        if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
+                return -EINVAL;

So I have *no idea* why you'd want to check it against the multicast
group too..

It is OK for the multicast group to have a smaller MTU than the
unicast side. This is how RC operates after all.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 15, 2017, 8:35 a.m. UTC | #10
On Fri, Jan 13, 2017 at 02:27:21PM -0700, Jason Gunthorpe wrote:
> On Fri, Jan 13, 2017 at 10:31:27PM +0200, Leon Romanovsky wrote:
>
> > > This is about what happens if the multicast MTU < unicast MTU - which
> > > is *exactly* the same case on RC and UD.
> > >
> > > IPoIB cannot send a 64k multicast MTU on RC either.
> >
> > Multicast is sent in datagram despite being in connected mode and for
> > datagram the MTU is limited by IB
>
> I am aware of that. Read you patch again:
>
> > +	if (priv->mcast_mtu < priv->admin_mtu)
> > +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",
>
> This check is fails in RC mode as well.
>
> And we already check if the requested MTU is beyond the port
> capability:
>
> +        if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
> +                return -EINVAL;
>
> So I have *no idea* why you'd want to check it against the multicast
> group too..
>

Port MTU capability can be higher than link MTU. In this flow, the user
can try to set manually MTU which will be higher than SM provided.

Immediately after that print, we have a following piece of code:
 236         dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

> It is OK for the multicast group to have a smaller MTU than the
> unicast side. This is how RC operates after all.
>
> Jason
Jason Gunthorpe Jan. 16, 2017, 8:12 p.m. UTC | #11
On Sun, Jan 15, 2017 at 10:35:43AM +0200, Leon Romanovsky wrote:
> On Fri, Jan 13, 2017 at 02:27:21PM -0700, Jason Gunthorpe wrote:
> > On Fri, Jan 13, 2017 at 10:31:27PM +0200, Leon Romanovsky wrote:
> >
> > > > This is about what happens if the multicast MTU < unicast MTU - which
> > > > is *exactly* the same case on RC and UD.
> > > >
> > > > IPoIB cannot send a 64k multicast MTU on RC either.
> > >
> > > Multicast is sent in datagram despite being in connected mode and for
> > > datagram the MTU is limited by IB
> >
> > I am aware of that. Read you patch again:
> >
> > > +	if (priv->mcast_mtu < priv->admin_mtu)
> > > +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",
> >
> > This check is fails in RC mode as well.
> >
> > And we already check if the requested MTU is beyond the port
> > capability:
> >
> > +        if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
> > +                return -EINVAL;
> >
> > So I have *no idea* why you'd want to check it against the multicast
> > group too..
> >
> 
> Port MTU capability can be higher than link MTU. In this flow, the user
> can try to set manually MTU which will be higher than SM provided.
> 
> Immediately after that print, we have a following piece of code:
>  236         dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

Maybe the better fix is to take that min out than generate a
warning.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 17, 2017, 7:31 p.m. UTC | #12
On Mon, Jan 16, 2017 at 01:12:18PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 15, 2017 at 10:35:43AM +0200, Leon Romanovsky wrote:
> > On Fri, Jan 13, 2017 at 02:27:21PM -0700, Jason Gunthorpe wrote:
> > > On Fri, Jan 13, 2017 at 10:31:27PM +0200, Leon Romanovsky wrote:
> > >
> > > > > This is about what happens if the multicast MTU < unicast MTU - which
> > > > > is *exactly* the same case on RC and UD.
> > > > >
> > > > > IPoIB cannot send a 64k multicast MTU on RC either.
> > > >
> > > > Multicast is sent in datagram despite being in connected mode and for
> > > > datagram the MTU is limited by IB
> > >
> > > I am aware of that. Read you patch again:
> > >
> > > > +	if (priv->mcast_mtu < priv->admin_mtu)
> > > > +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",
> > >
> > > This check is fails in RC mode as well.
> > >
> > > And we already check if the requested MTU is beyond the port
> > > capability:
> > >
> > > +        if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
> > > +                return -EINVAL;
> > >
> > > So I have *no idea* why you'd want to check it against the multicast
> > > group too..
> > >
> >
> > Port MTU capability can be higher than link MTU. In this flow, the user
> > can try to set manually MTU which will be higher than SM provided.
> >
> > Immediately after that print, we have a following piece of code:
> >  236         dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
>
> Maybe the better fix is to take that min out than generate a
> warning.

It looks like that I need more time to grasp it.
Sorry about that.

>
> Jason
Leon Romanovsky Jan. 17, 2017, 7:34 p.m. UTC | #13
On Fri, Jan 13, 2017 at 05:15:07PM +0200, Leon Romanovsky wrote:
> On Thu, Jan 12, 2017 at 04:58:28PM -0500, Doug Ledford wrote:
> > On Thu, 2017-01-12 at 21:35 +0200, Leon Romanovsky wrote:
> > > 
> > > First of all, thank you for fixing wording, for me it is the hardest
> > > part of every commit.
> >
> > No worries.
> >
> > > Second, I have a different view from you on the issue. User
> > > configured
> > > some value, which is not correct for IPoIB. In ideal world (without
> > > legacy),
> > > we were supposed to return error to him with proper message,
> >
> > If you set an impossible MTU on an ethernet adapter, you get a return
> > of EINVAL.  But it doesn't mess with the kernel log at all, it's *just*
> > EINVAL to the calling program.
> >
> > >  but in our
> > > case (legacy applications) we can't (we tried and it broke some
> > > legacy
> > > ifcongfig, if I remember well).
> >
> > ifconfig is what I used to test what Ethernet does, so it should be
> > well and truly used to EINVAL as a return.
>
> I'll recheck with Feras next week which legacy tool
> didn't work and will post it.

At the end, it was ethtool and according to Jason's comments it was
correct thing do not to put return value, so we are fine.

>
> >
> > >  So it leaves us with one available
> > > option is to warn user about improper value.
> >
> > Even if you want to alert the user, there is no need to warn.  Warn is
> > reserved for something that is a serious error or condition, this is an
> > EINVAL of something that is, at best, a performance issue.  Unless path
> > MTU support is broken, failing to set a larger MTU will not ever hinder
> > actual operation.  So we should never be dumping out KERN_WARN messages
> > into the kernel log that will ping up on any root login session as well
> > as clutter the kernel dmesg output.
> >
> > > User should know that he supplied wrong parameter.
> >
> > User can still find out, they just won't get proactive pings on all
> > root sessions, instead they will have to look in the dmesg output
> > because they are trying to figure out why their command didn't work.
> >
> > > >
> > > >
> > > > --
> > > > Doug Ledford <dledford@redhat.com>
> > > >     GPG KeyID: B826A3330E572FDD
> > > >    
> > > > Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > > > 2FDD
> > >
> > >
> > --
> > Doug Ledford <dledford@redhat.com>
> >     GPG KeyID: B826A3330E572FDD
> >    
> > Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
>
>
Leon Romanovsky Jan. 19, 2017, 1:12 p.m. UTC | #14
On Mon, Jan 16, 2017 at 01:12:18PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 15, 2017 at 10:35:43AM +0200, Leon Romanovsky wrote:
> > On Fri, Jan 13, 2017 at 02:27:21PM -0700, Jason Gunthorpe wrote:
> > > On Fri, Jan 13, 2017 at 10:31:27PM +0200, Leon Romanovsky wrote:
> > >
> > > > > This is about what happens if the multicast MTU < unicast MTU - which
> > > > > is *exactly* the same case on RC and UD.
> > > > >
> > > > > IPoIB cannot send a 64k multicast MTU on RC either.
> > > >
> > > > Multicast is sent in datagram despite being in connected mode and for
> > > > datagram the MTU is limited by IB
> > >
> > > I am aware of that. Read you patch again:
> > >
> > > > +	if (priv->mcast_mtu < priv->admin_mtu)
> > > > +		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",
> > >
> > > This check is fails in RC mode as well.
> > >
> > > And we already check if the requested MTU is beyond the port
> > > capability:
> > >
> > > +        if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
> > > +                return -EINVAL;
> > >
> > > So I have *no idea* why you'd want to check it against the multicast
> > > group too..
> > >
> >
> > Port MTU capability can be higher than link MTU. In this flow, the user
> > can try to set manually MTU which will be higher than SM provided.
> >
> > Immediately after that print, we have a following piece of code:
> >  236         dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
>
> Maybe the better fix is to take that min out than generate a
> warning.

I agree with you, in theory it should work.

Let's hope that we will find such brave man who will
remove this line. The removal of this one line will
require from him to do extensive testing and reviewing
different code paths.

Thanks

>
> Jason
Jason Gunthorpe Jan. 19, 2017, 5:33 p.m. UTC | #15
On Thu, Jan 19, 2017 at 03:12:58PM +0200, Leon Romanovsky wrote:

> Let's hope that we will find such brave man who will
> remove this line. The removal of this one line will
> require from him to do extensive testing and reviewing
> different code paths.

Why? We already fully support this configuration for RC

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 22, 2017, 8:05 a.m. UTC | #16
On Thu, Jan 19, 2017 at 10:33:52AM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 19, 2017 at 03:12:58PM +0200, Leon Romanovsky wrote:
>
> > Let's hope that we will find such brave man who will
> > remove this line. The removal of this one line will
> > require from him to do extensive testing and reviewing
> > different code paths.
>
> Why? We already fully support this configuration for RC

At minimum, i see here one new path which should be tested.
Configure small MTU -> Join multicast group -> Configure large MTU

Before removing min(..), in third step, we ensured that MTU won't exceed
multicast MTU in all flows.

After removing min(..), you will have a chance of flows there MTU limit
won't work.

And I just saying that it needs to be tested.

>
> Jason
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3ce0765..a550cc6 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -229,6 +229,10 @@  static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 
 	priv->admin_mtu = new_mtu;
 
+	if (priv->mcast_mtu < priv->admin_mtu)
+		ipoib_warn(priv, "MTU must be smaller than mcast_mtu (%u)\n",
+			   priv->mcast_mtu);
+
 	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
 
 	return 0;