diff mbox series

[net-next] net-loopback: allow lo dev initial state to be controlled

Message ID 20201111204308.3352959-1-jianyang.kernel@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net-loopback: allow lo dev initial state to be controlled | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 7710 this patch: 7710
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 63 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 8075 this patch: 8075
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Jian Yang Nov. 11, 2020, 8:43 p.m. UTC
From: Mahesh Bandewar <maheshb@google.com>

Traditionally loopback devices comes up with initial state as DOWN for
any new network-namespace. This would mean that anyone needing this
device (which is mostly true except sandboxes where networking in not
needed at all), would have to bring this UP by issuing something like
'ip link set lo up' which can be avoided if the initial state can be set
as UP. Also ICMP error propagation needs loopback to be UP.

The default value for this sysctl is set to ZERO which will preserve the
backward compatible behavior for the root-netns while changing the
sysctl will only alter the behavior of the newer network namespaces.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Jian Yang <jianyang@google.com>
---
 Documentation/admin-guide/sysctl/net.rst | 11 +++++++++++
 drivers/net/loopback.c                   |  7 +++++++
 include/linux/netdevice.h                |  1 +
 net/core/sysctl_net_core.c               | 14 ++++++++++++++
 4 files changed, 33 insertions(+)

Comments

Andrew Lunn Nov. 12, 2020, 4:08 p.m. UTC | #1
On Wed, Nov 11, 2020 at 12:43:08PM -0800, Jian Yang wrote:
> From: Mahesh Bandewar <maheshb@google.com>
> 
> Traditionally loopback devices comes up with initial state as DOWN for
> any new network-namespace. This would mean that anyone needing this
> device (which is mostly true except sandboxes where networking in not
> needed at all), would have to bring this UP by issuing something like
> 'ip link set lo up' which can be avoided if the initial state can be set
> as UP.

How useful is lo if it is up, but has no IP address? I don't think
this change adds the IP addresses does it? So you still need something
inside your netns to add the IP addresses? Which seems to make this
change pointless?

       Andrew
Dan Williams Nov. 12, 2020, 7:54 p.m. UTC | #2
On Thu, 2020-11-12 at 17:08 +0100, Andrew Lunn wrote:
> On Wed, Nov 11, 2020 at 12:43:08PM -0800, Jian Yang wrote:
> > From: Mahesh Bandewar <maheshb@google.com>
> > 
> > Traditionally loopback devices comes up with initial state as DOWN
> > for
> > any new network-namespace. This would mean that anyone needing this
> > device (which is mostly true except sandboxes where networking in
> > not
> > needed at all), would have to bring this UP by issuing something
> > like
> > 'ip link set lo up' which can be avoided if the initial state can
> > be set
> > as UP.
> 
> How useful is lo if it is up, but has no IP address? I don't think
> this change adds the IP addresses does it? So you still need
> something
> inside your netns to add the IP addresses? Which seems to make this
> change pointless?

lo gets addresses automatically these days, no?

$ ip netns add blue
$ ip netns exec blue ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
$ ip netns exec blue ip link set lo up
$ ip netns exec blue ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever

Dan
Jakub Kicinski Nov. 14, 2020, 6:17 p.m. UTC | #3
On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:
> From: Mahesh Bandewar <maheshb@google.com>
> 
> Traditionally loopback devices comes up with initial state as DOWN for
> any new network-namespace. This would mean that anyone needing this
> device (which is mostly true except sandboxes where networking in not
> needed at all), would have to bring this UP by issuing something like
> 'ip link set lo up' which can be avoided if the initial state can be set
> as UP. Also ICMP error propagation needs loopback to be UP.
> 
> The default value for this sysctl is set to ZERO which will preserve the
> backward compatible behavior for the root-netns while changing the
> sysctl will only alter the behavior of the newer network namespaces.

Any reason why the new sysctl itself is not per netns?

> +netdev_loopback_state
> +---------------------

loopback_init_state ?

> +Controls the loopback device initial state for any new network namespaces. By
> +default, we keep the initial state as DOWN.
> +
> +If set to 1, the loopback device will be brought UP during namespace creation.
> +This will only apply to all new network namespaces.
> +
> +Default : 0  (for compatibility reasons)
> +
>  netdev_max_backlog
>  ------------------
>  
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index a1c77cc00416..76dc92ac65a2 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
>  
>  	BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
>  	net->loopback_dev = dev;
> +
> +	if (sysctl_netdev_loopback_state) {
> +		/* Bring loopback device UP */
> +		rtnl_lock();
> +		dev_open(dev, NULL);
> +		rtnl_unlock();
> +	}

The only concern I have here is that it breaks notification ordering.
Is there precedent for NETDEV_UP to be generated before all pernet ops
->init was called?
On Sat, Nov 14, 2020 at 10:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:
> > From: Mahesh Bandewar <maheshb@google.com>
> >
> > Traditionally loopback devices comes up with initial state as DOWN for
> > any new network-namespace. This would mean that anyone needing this
> > device (which is mostly true except sandboxes where networking in not
> > needed at all), would have to bring this UP by issuing something like
> > 'ip link set lo up' which can be avoided if the initial state can be set
> > as UP. Also ICMP error propagation needs loopback to be UP.
> >
> > The default value for this sysctl is set to ZERO which will preserve the
> > backward compatible behavior for the root-netns while changing the
> > sysctl will only alter the behavior of the newer network namespaces.
>
> Any reason why the new sysctl itself is not per netns?
>
Making it per netns would not be very useful since its effect is only
during netns creation.

> > +netdev_loopback_state
> > +---------------------
>
> loopback_init_state ?
>
That's fine, thanks for the suggestion.

> > +Controls the loopback device initial state for any new network namespaces. By
> > +default, we keep the initial state as DOWN.
> > +
> > +If set to 1, the loopback device will be brought UP during namespace creation.
> > +This will only apply to all new network namespaces.
> > +
> > +Default : 0  (for compatibility reasons)
> > +
> >  netdev_max_backlog
> >  ------------------
> >
> > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > index a1c77cc00416..76dc92ac65a2 100644
> > --- a/drivers/net/loopback.c
> > +++ b/drivers/net/loopback.c
> > @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
> >
> >       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
> >       net->loopback_dev = dev;
> > +
> > +     if (sysctl_netdev_loopback_state) {
> > +             /* Bring loopback device UP */
> > +             rtnl_lock();
> > +             dev_open(dev, NULL);
> > +             rtnl_unlock();
> > +     }
>
> The only concern I have here is that it breaks notification ordering.
> Is there precedent for NETDEV_UP to be generated before all pernet ops
> ->init was called?
I'm not sure if any and didn't see any issues in our usage / tests.
I'm not even sure anyone is watching/monitoring for lo status as such.
Jakub Kicinski Nov. 16, 2020, 8:17 p.m. UTC | #5
On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> On Sat, Nov 14, 2020 at 10:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:  
> > > From: Mahesh Bandewar <maheshb@google.com>
> > >
> > > Traditionally loopback devices comes up with initial state as DOWN for
> > > any new network-namespace. This would mean that anyone needing this
> > > device (which is mostly true except sandboxes where networking in not
> > > needed at all), would have to bring this UP by issuing something like
> > > 'ip link set lo up' which can be avoided if the initial state can be set
> > > as UP. Also ICMP error propagation needs loopback to be UP.
> > >
> > > The default value for this sysctl is set to ZERO which will preserve the
> > > backward compatible behavior for the root-netns while changing the
> > > sysctl will only alter the behavior of the newer network namespaces.  
>
> > Any reason why the new sysctl itself is not per netns?
> >  
> Making it per netns would not be very useful since its effect is only
> during netns creation.

I must be confused. Are all namespaces spawned off init_net, not the
current netns the process is in?
Jakub Kicinski Nov. 16, 2020, 8:34 p.m. UTC | #6
On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > > index a1c77cc00416..76dc92ac65a2 100644
> > > --- a/drivers/net/loopback.c
> > > +++ b/drivers/net/loopback.c
> > > @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
> > >
> > >       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
> > >       net->loopback_dev = dev;
> > > +
> > > +     if (sysctl_netdev_loopback_state) {
> > > +             /* Bring loopback device UP */
> > > +             rtnl_lock();
> > > +             dev_open(dev, NULL);
> > > +             rtnl_unlock();
> > > +     }  
> >
> > The only concern I have here is that it breaks notification ordering.
> > Is there precedent for NETDEV_UP to be generated before all pernet ops  
> > ->init was called?  
> I'm not sure if any and didn't see any issues in our usage / tests.
> I'm not even sure anyone is watching/monitoring for lo status as such.

Ido, David, how does this sound to you?

I can't think of any particular case where bringing the device up (and
populating it's addresses) before per netns init is finished could be
problematic. But if this is going to make kernel coding harder the
minor convenience of the knob is probably not worth it.
On Mon, Nov 16, 2020 at 12:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Sat, Nov 14, 2020 at 10:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:
> > > > From: Mahesh Bandewar <maheshb@google.com>
> > > >
> > > > Traditionally loopback devices comes up with initial state as DOWN for
> > > > any new network-namespace. This would mean that anyone needing this
> > > > device (which is mostly true except sandboxes where networking in not
> > > > needed at all), would have to bring this UP by issuing something like
> > > > 'ip link set lo up' which can be avoided if the initial state can be set
> > > > as UP. Also ICMP error propagation needs loopback to be UP.
> > > >
> > > > The default value for this sysctl is set to ZERO which will preserve the
> > > > backward compatible behavior for the root-netns while changing the
> > > > sysctl will only alter the behavior of the newer network namespaces.
> >
> > > Any reason why the new sysctl itself is not per netns?
> > >
> > Making it per netns would not be very useful since its effect is only
> > during netns creation.
>
> I must be confused. Are all namespaces spawned off init_net, not the
> current netns the process is in?
The namespace hierarchy is maintained in user-ns while we have per-ns
sysctls hanging off of a netns object and we don't have parent (netns)
reference when initializing newly created netns values. One can copy
the current value of the settings from root-ns but I don't think
that's a good practice since there is no clear way to affect those
values when the root-ns changes them. Also from the isolation
perspective (I think) it's better to have this behavior (sysctl
values) stand on it's own i.e. have default values and then alter
values on it's own without linking to any other netns values.
On Mon, Nov 16, 2020 at 12:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > > > index a1c77cc00416..76dc92ac65a2 100644
> > > > --- a/drivers/net/loopback.c
> > > > +++ b/drivers/net/loopback.c
> > > > @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
> > > >
> > > >       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
> > > >       net->loopback_dev = dev;
> > > > +
> > > > +     if (sysctl_netdev_loopback_state) {
> > > > +             /* Bring loopback device UP */
> > > > +             rtnl_lock();
> > > > +             dev_open(dev, NULL);
> > > > +             rtnl_unlock();
> > > > +     }
> > >
> > > The only concern I have here is that it breaks notification ordering.
> > > Is there precedent for NETDEV_UP to be generated before all pernet ops
> > > ->init was called?
> > I'm not sure if any and didn't see any issues in our usage / tests.
> > I'm not even sure anyone is watching/monitoring for lo status as such.
>
> Ido, David, how does this sound to you?
>
> I can't think of any particular case where bringing the device up (and
> populating it's addresses) before per netns init is finished could be
> problematic. But if this is going to make kernel coding harder the
> minor convenience of the knob is probably not worth it.

+Eric Dumazet

I'm not sure why kernel coding should get harder, but happy to listen
to the opinions.
Jakub Kicinski Nov. 16, 2020, 9:20 p.m. UTC | #9
On Mon, 16 Nov 2020 12:50:22 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> On Mon, Nov 16, 2020 at 12:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:  
> > > On Sat, Nov 14, 2020 at 10:17 AM Jakub Kicinski <kuba@kernel.org> wrote:  
> > > > On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:  
> > > > > From: Mahesh Bandewar <maheshb@google.com>
> > > > >
> > > > > Traditionally loopback devices comes up with initial state as DOWN for
> > > > > any new network-namespace. This would mean that anyone needing this
> > > > > device (which is mostly true except sandboxes where networking in not
> > > > > needed at all), would have to bring this UP by issuing something like
> > > > > 'ip link set lo up' which can be avoided if the initial state can be set
> > > > > as UP. Also ICMP error propagation needs loopback to be UP.
> > > > >
> > > > > The default value for this sysctl is set to ZERO which will preserve the
> > > > > backward compatible behavior for the root-netns while changing the
> > > > > sysctl will only alter the behavior of the newer network namespaces.  
> > >  
> > > > Any reason why the new sysctl itself is not per netns?
> > > >  
> > > Making it per netns would not be very useful since its effect is only
> > > during netns creation.  
> >
> > I must be confused. Are all namespaces spawned off init_net, not the
> > current netns the process is in?  
> The namespace hierarchy is maintained in user-ns while we have per-ns
> sysctls hanging off of a netns object and we don't have parent (netns)
> reference when initializing newly created netns values. One can copy
> the current value of the settings from root-ns but I don't think
> that's a good practice since there is no clear way to affect those
> values when the root-ns changes them. Also from the isolation
> perspective (I think) it's better to have this behavior (sysctl
> values) stand on it's own i.e. have default values and then alter
> values on it's own without linking to any other netns values.

To be clear, what I meant was just to make the sysctl per namespace.
That way you can deploy a workload with this sysctl set appropriately
without changing the system-global setting.

Is your expectation that particular application stacks would take
advantage of this, or that people would set this to 1 across the
fleet?
On Mon, Nov 16, 2020 at 1:20 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Nov 2020 12:50:22 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Mon, Nov 16, 2020 at 12:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > > > On Sat, Nov 14, 2020 at 10:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > On Wed, 11 Nov 2020 12:43:08 -0800 Jian Yang wrote:
> > > > > > From: Mahesh Bandewar <maheshb@google.com>
> > > > > >
> > > > > > Traditionally loopback devices comes up with initial state as DOWN for
> > > > > > any new network-namespace. This would mean that anyone needing this
> > > > > > device (which is mostly true except sandboxes where networking in not
> > > > > > needed at all), would have to bring this UP by issuing something like
> > > > > > 'ip link set lo up' which can be avoided if the initial state can be set
> > > > > > as UP. Also ICMP error propagation needs loopback to be UP.
> > > > > >
> > > > > > The default value for this sysctl is set to ZERO which will preserve the
> > > > > > backward compatible behavior for the root-netns while changing the
> > > > > > sysctl will only alter the behavior of the newer network namespaces.
> > > >
> > > > > Any reason why the new sysctl itself is not per netns?
> > > > >
> > > > Making it per netns would not be very useful since its effect is only
> > > > during netns creation.
> > >
> > > I must be confused. Are all namespaces spawned off init_net, not the
> > > current netns the process is in?
> > The namespace hierarchy is maintained in user-ns while we have per-ns
> > sysctls hanging off of a netns object and we don't have parent (netns)
> > reference when initializing newly created netns values. One can copy
> > the current value of the settings from root-ns but I don't think
> > that's a good practice since there is no clear way to affect those
> > values when the root-ns changes them. Also from the isolation
> > perspective (I think) it's better to have this behavior (sysctl
> > values) stand on it's own i.e. have default values and then alter
> > values on it's own without linking to any other netns values.
>
> To be clear, what I meant was just to make the sysctl per namespace.
> That way you can deploy a workload with this sysctl set appropriately
> without changing the system-global setting.
>
> Is your expectation that particular application stacks would take
> advantage of this, or that people would set this to 1 across the
> fleet?

Having loopback DOWN is useful to only a tiny fraction of netns use
cases where networking is not enabled but since that's how it had been
historically so breaking that (default) behavior is not right. But
apart from those cases, wherever networking is used inside netns (most
of the use cases), loopback is always required to be UP otherwise your
ICMP error delivery is affected. So workloads that always use
networking inside netns would set this value to be 1 always (Google's
use case) while those workloads where there is a mix of non-networking
as well as networking enabled netns-es can use the default behavior
that has been traditionally present (where the value set to 0).
kernel test robot Nov. 17, 2020, 4:50 a.m. UTC | #11
Hi Jian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jian-Yang/net-loopback-allow-lo-dev-initial-state-to-be-controlled/20201112-044539
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 70408949a35f1a31c327c69b6a187635cb0305fa
config: powerpc64-randconfig-r006-20201116 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f1167177eeca028a046726f582c332d4c638a0c8
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jian-Yang/net-loopback-allow-lo-dev-initial-state-to-be-controlled/20201112-044539
        git checkout f1167177eeca028a046726f582c332d4c638a0c8
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   powerpc-linux-ld: drivers/net/loopback.o: in function `loopback_net_init':
>> loopback.c:(.init.text+0x7a): undefined reference to `sysctl_netdev_loopback_state'
>> powerpc-linux-ld: loopback.c:(.init.text+0x7e): undefined reference to `sysctl_netdev_loopback_state'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Ido Schimmel Nov. 17, 2020, 5:18 p.m. UTC | #12
On Mon, Nov 16, 2020 at 01:03:32PM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Mon, Nov 16, 2020 at 12:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > > > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > > > > index a1c77cc00416..76dc92ac65a2 100644
> > > > > --- a/drivers/net/loopback.c
> > > > > +++ b/drivers/net/loopback.c
> > > > > @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
> > > > >
> > > > >       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
> > > > >       net->loopback_dev = dev;
> > > > > +
> > > > > +     if (sysctl_netdev_loopback_state) {
> > > > > +             /* Bring loopback device UP */
> > > > > +             rtnl_lock();
> > > > > +             dev_open(dev, NULL);
> > > > > +             rtnl_unlock();
> > > > > +     }
> > > >
> > > > The only concern I have here is that it breaks notification ordering.
> > > > Is there precedent for NETDEV_UP to be generated before all pernet ops
> > > > ->init was called?
> > > I'm not sure if any and didn't see any issues in our usage / tests.
> > > I'm not even sure anyone is watching/monitoring for lo status as such.
> >
> > Ido, David, how does this sound to you?
> >
> > I can't think of any particular case where bringing the device up (and
> > populating it's addresses) before per netns init is finished could be
> > problematic. But if this is going to make kernel coding harder the
> > minor convenience of the knob is probably not worth it.
> 
> +Eric Dumazet
> 
> I'm not sure why kernel coding should get harder, but happy to listen
> to the opinions.

Hi,

Sorry for the delay. Does not occur to me as a problematic change. I ran
various tests with 'sysctl -qw net.core.netdev_loopback_state=1' and a
debug config. Looks OK.
On Tue, Nov 17, 2020 at 9:18 AM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Mon, Nov 16, 2020 at 01:03:32PM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Mon, Nov 16, 2020 at 12:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > > > > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > > > > > index a1c77cc00416..76dc92ac65a2 100644
> > > > > > --- a/drivers/net/loopback.c
> > > > > > +++ b/drivers/net/loopback.c
> > > > > > @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
> > > > > >
> > > > > >       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
> > > > > >       net->loopback_dev = dev;
> > > > > > +
> > > > > > +     if (sysctl_netdev_loopback_state) {
> > > > > > +             /* Bring loopback device UP */
> > > > > > +             rtnl_lock();
> > > > > > +             dev_open(dev, NULL);
> > > > > > +             rtnl_unlock();
> > > > > > +     }
> > > > >
> > > > > The only concern I have here is that it breaks notification ordering.
> > > > > Is there precedent for NETDEV_UP to be generated before all pernet ops
> > > > > ->init was called?
> > > > I'm not sure if any and didn't see any issues in our usage / tests.
> > > > I'm not even sure anyone is watching/monitoring for lo status as such.
> > >
> > > Ido, David, how does this sound to you?
> > >
> > > I can't think of any particular case where bringing the device up (and
> > > populating it's addresses) before per netns init is finished could be
> > > problematic. But if this is going to make kernel coding harder the
> > > minor convenience of the knob is probably not worth it.
> >
> > +Eric Dumazet
> >
> > I'm not sure why kernel coding should get harder, but happy to listen
> > to the opinions.
>
> Hi,
>
> Sorry for the delay. Does not occur to me as a problematic change. I ran
> various tests with 'sysctl -qw net.core.netdev_loopback_state=1' and a
> debug config. Looks OK.

Thanks for the confirmation Ido. I think Jian is getting powerpc
config build fixed to address the build-bots findings and then he can
push the updated version.
David Ahern Nov. 18, 2020, 1:12 a.m. UTC | #14
On 11/17/20 1:53 PM, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Tue, Nov 17, 2020 at 9:18 AM Ido Schimmel <idosch@idosch.org> wrote:
>>
>> On Mon, Nov 16, 2020 at 01:03:32PM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
>>> On Mon, Nov 16, 2020 at 12:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Mon, 16 Nov 2020 12:02:48 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
>>>>>>> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
>>>>>>> index a1c77cc00416..76dc92ac65a2 100644
>>>>>>> --- a/drivers/net/loopback.c
>>>>>>> +++ b/drivers/net/loopback.c
>>>>>>> @@ -219,6 +219,13 @@ static __net_init int loopback_net_init(struct net *net)
>>>>>>>
>>>>>>>       BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
>>>>>>>       net->loopback_dev = dev;
>>>>>>> +
>>>>>>> +     if (sysctl_netdev_loopback_state) {
>>>>>>> +             /* Bring loopback device UP */
>>>>>>> +             rtnl_lock();
>>>>>>> +             dev_open(dev, NULL);
>>>>>>> +             rtnl_unlock();
>>>>>>> +     }
>>>>>>
>>>>>> The only concern I have here is that it breaks notification ordering.
>>>>>> Is there precedent for NETDEV_UP to be generated before all pernet ops
>>>>>> ->init was called?
>>>>> I'm not sure if any and didn't see any issues in our usage / tests.
>>>>> I'm not even sure anyone is watching/monitoring for lo status as such.
>>>>
>>>> Ido, David, how does this sound to you?
>>>>
>>>> I can't think of any particular case where bringing the device up (and
>>>> populating it's addresses) before per netns init is finished could be
>>>> problematic. But if this is going to make kernel coding harder the
>>>> minor convenience of the knob is probably not worth it.
>>>
>>> +Eric Dumazet
>>>
>>> I'm not sure why kernel coding should get harder, but happy to listen
>>> to the opinions.
>>
>> Hi,
>>
>> Sorry for the delay. Does not occur to me as a problematic change. I ran
>> various tests with 'sysctl -qw net.core.netdev_loopback_state=1' and a
>> debug config. Looks OK.
> 
> Thanks for the confirmation Ido. I think Jian is getting powerpc
> config build fixed to address the build-bots findings and then he can
> push the updated version.
> 

If there is no harm in just creating lo in the up state, why not just do
it vs relying on a sysctl? It only affects 'local' networking so no real
impact to containers that do not do networking (ie., packets can't
escape). Linux has a lot of sysctl options; is this one really needed?
Nicolas Dichtel Nov. 18, 2020, 4:58 p.m. UTC | #15
Le 18/11/2020 à 02:12, David Ahern a écrit :
[snip]
> If there is no harm in just creating lo in the up state, why not just do
> it vs relying on a sysctl? It only affects 'local' networking so no real
> impact to containers that do not do networking (ie., packets can't
> escape). Linux has a lot of sysctl options; is this one really needed?
> 
+1

And thus, it will benefit to everybody.
On Wed, Nov 18, 2020 at 8:58 AM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Le 18/11/2020 à 02:12, David Ahern a écrit :
> [snip]
> > If there is no harm in just creating lo in the up state, why not just do
> > it vs relying on a sysctl? It only affects 'local' networking so no real
> > impact to containers that do not do networking (ie., packets can't
> > escape). Linux has a lot of sysctl options; is this one really needed?
> >
I started with that approach but then I was informed about these
containers that disable networking all together including loopback.
Also bringing up by default would break backward compatibility hence
resorted to sysctl.
> +1
>
> And thus, it will benefit to everybody.

Well, it benefits everyone who uses networking (most of us) inside
netns but would create problems for workloads that create netns to
disable networking. One can always disable it after creating the netns
but that would mean change in the workflow and it could be viewed as
regression.
David Ahern Nov. 18, 2020, 6:04 p.m. UTC | #17
On 11/18/20 10:39 AM, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Wed, Nov 18, 2020 at 8:58 AM Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>>
>> Le 18/11/2020 à 02:12, David Ahern a écrit :
>> [snip]
>>> If there is no harm in just creating lo in the up state, why not just do
>>> it vs relying on a sysctl? It only affects 'local' networking so no real
>>> impact to containers that do not do networking (ie., packets can't
>>> escape). Linux has a lot of sysctl options; is this one really needed?
>>>
> I started with that approach but then I was informed about these
> containers that disable networking all together including loopback.
> Also bringing up by default would break backward compatibility hence
> resorted to sysctl.
>> +1
>>
>> And thus, it will benefit to everybody.
> 
> Well, it benefits everyone who uses networking (most of us) inside
> netns but would create problems for workloads that create netns to
> disable networking. One can always disable it after creating the netns
> but that would mean change in the workflow and it could be viewed as
> regression.
> 

Then perhaps the relevant sysctl -- or maybe netns attribute -- is
whether to create a loopback device at all.
On Wed, Nov 18, 2020 at 10:04 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 11/18/20 10:39 AM, Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Wed, Nov 18, 2020 at 8:58 AM Nicolas Dichtel
> > <nicolas.dichtel@6wind.com> wrote:
> >>
> >> Le 18/11/2020 à 02:12, David Ahern a écrit :
> >> [snip]
> >>> If there is no harm in just creating lo in the up state, why not just do
> >>> it vs relying on a sysctl? It only affects 'local' networking so no real
> >>> impact to containers that do not do networking (ie., packets can't
> >>> escape). Linux has a lot of sysctl options; is this one really needed?
> >>>
> > I started with that approach but then I was informed about these
> > containers that disable networking all together including loopback.
> > Also bringing up by default would break backward compatibility hence
> > resorted to sysctl.
> >> +1
> >>
> >> And thus, it will benefit to everybody.
> >
> > Well, it benefits everyone who uses networking (most of us) inside
> > netns but would create problems for workloads that create netns to
> > disable networking. One can always disable it after creating the netns
> > but that would mean change in the workflow and it could be viewed as
> > regression.
> >
>
> Then perhaps the relevant sysctl -- or maybe netns attribute -- is
> whether to create a loopback device at all.

I'm open to ideas within the bounds of maintaining backward
compatibility. However, not able to see how we can pull it off without
creating a 'loopback' device.
Nicolas Dichtel Nov. 19, 2020, 8:03 a.m. UTC | #19
Le 18/11/2020 à 18:39, Mahesh Bandewar (महेश बंडेवार) a écrit :
> On Wed, Nov 18, 2020 at 8:58 AM Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>>
>> Le 18/11/2020 à 02:12, David Ahern a écrit :
>> [snip]
>>> If there is no harm in just creating lo in the up state, why not just do
>>> it vs relying on a sysctl? It only affects 'local' networking so no real
>>> impact to containers that do not do networking (ie., packets can't
>>> escape). Linux has a lot of sysctl options; is this one really needed?
>>>
> I started with that approach but then I was informed about these
> containers that disable networking all together including loopback.
> Also bringing up by default would break backward compatibility hence
> resorted to sysctl.
>> +1
>>
>> And thus, it will benefit to everybody.
> 
> Well, it benefits everyone who uses networking (most of us) inside
Sure.

> netns but would create problems for workloads that create netns to
> disable networking. One can always disable it after creating the netns
> but that would mean change in the workflow and it could be viewed as
> regression.
The networking is very limited with only a loopback. Do you have some real use
case in mind?
On Thu, Nov 19, 2020 at 12:03 AM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Le 18/11/2020 à 18:39, Mahesh Bandewar (महेश बंडेवार) a écrit :
> > On Wed, Nov 18, 2020 at 8:58 AM Nicolas Dichtel
> > <nicolas.dichtel@6wind.com> wrote:
> >>
> >> Le 18/11/2020 à 02:12, David Ahern a écrit :
> >> [snip]
> >>> If there is no harm in just creating lo in the up state, why not just do
> >>> it vs relying on a sysctl? It only affects 'local' networking so no real
> >>> impact to containers that do not do networking (ie., packets can't
> >>> escape). Linux has a lot of sysctl options; is this one really needed?
> >>>
> > I started with that approach but then I was informed about these
> > containers that disable networking all together including loopback.
> > Also bringing up by default would break backward compatibility hence
> > resorted to sysctl.
> >> +1
> >>
> >> And thus, it will benefit to everybody.
> >
> > Well, it benefits everyone who uses networking (most of us) inside
> Sure.
>
> > netns but would create problems for workloads that create netns to
> > disable networking. One can always disable it after creating the netns
> > but that would mean change in the workflow and it could be viewed as
> > regression.
> The networking is very limited with only a loopback. Do you have some real use
> case in mind?

My use cases all use networking but I think principally we cannot
break backward compatibility, right?
Jakub, WDYT?
Jakub Kicinski Nov. 20, 2020, 4:56 a.m. UTC | #21
On Thu, 19 Nov 2020 19:55:08 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Nov 19, 2020 at 12:03 AM Nicolas Dichtel
> > Le 18/11/2020 à 18:39, Mahesh Bandewar (महेश बंडेवार) a écrit :  
> > > netns but would create problems for workloads that create netns to
> > > disable networking. One can always disable it after creating the netns
> > > but that would mean change in the workflow and it could be viewed as
> > > regression.  
> > The networking is very limited with only a loopback. Do you have some real use
> > case in mind?  
> 
> My use cases all use networking but I think principally we cannot
> break backward compatibility, right?
> Jakub, WDYT?

Do you have more details on what the use cases are that expect no
networking?

TBH I don't get the utility of this knob. If you want to write vaguely
portable software you have to assume the knob won't be useful, because
either (a) kernel may be old, or (b) you shouldn't assume to own the
sysctls and this is a global one (what if an application spawns that
expects legacy behavior?)

And if you have to check for those two things you're gonna write more
code than just ifuping lo would be.

Maybe you can shed some more light on how it makes life at Google
easier for you? Or someone else can enlighten me?

I don't have much practical experience with namespaces, but the more 
I think about it the more pointless it seems.
On Thu, Nov 19, 2020 at 8:56 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 19 Nov 2020 19:55:08 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Thu, Nov 19, 2020 at 12:03 AM Nicolas Dichtel
> > > Le 18/11/2020 à 18:39, Mahesh Bandewar (महेश बंडेवार) a écrit :
> > > > netns but would create problems for workloads that create netns to
> > > > disable networking. One can always disable it after creating the netns
> > > > but that would mean change in the workflow and it could be viewed as
> > > > regression.
> > > The networking is very limited with only a loopback. Do you have some real use
> > > case in mind?
> >
> > My use cases all use networking but I think principally we cannot
> > break backward compatibility, right?
> > Jakub, WDYT?
>
> Do you have more details on what the use cases are that expect no
> networking?
>
> TBH I don't get the utility of this knob. If you want to write vaguely
> portable software you have to assume the knob won't be useful, because
> either (a) kernel may be old, or (b) you shouldn't assume to own the
> sysctls and this is a global one (what if an application spawns that
> expects legacy behavior?)
>
> And if you have to check for those two things you're gonna write more
> code than just ifuping lo would be.
>
> Maybe you can shed some more light on how it makes life at Google
> easier for you? Or someone else can enlighten me?
>
> I don't have much practical experience with namespaces, but the more
> I think about it the more pointless it seems.

Thanks for the input.

Sorry, I was on vacation and couldn't process this response earlier.

There are two things that this patch is providing and let me understand the
objections well.

(a) Bring up lo by default
(b) sysctl to protect the legacy behavior

Frankly we really dont have any legacy-behavior use case and one can
bring it back to life by just doing 'ifdown lo' if necessary. Most of
the use cases involve using networking anyways. My belief was that we
need to protect legacy behavior and hence went lengths to add sysctl
to protect it. If we are OK not to have it, I'm more than happy to
remove the sysctl and just have the 3 line patch to bring loopback up.

If legacy-behavior is a concern (which I thought generally is), then
either we can have the sysctl to have it around to protect it (the
current implementation) but if we prefer to have kernel-command-line
instead of sysctl that defaults to legacy behavior but if provided, we
can set it UP by default during boot (or the other way around)?

My primary motive is (a) while (b) is just a side-effect which we can
get away if deemed unnecessary.
Jakub Kicinski Dec. 2, 2020, 2:38 a.m. UTC | #23
On Tue, 1 Dec 2020 12:24:38 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Nov 19, 2020 at 8:56 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Do you have more details on what the use cases are that expect no
> > networking?
> >
> > TBH I don't get the utility of this knob. If you want to write vaguely
> > portable software you have to assume the knob won't be useful, because
> > either (a) kernel may be old, or (b) you shouldn't assume to own the
> > sysctls and this is a global one (what if an application spawns that
> > expects legacy behavior?)
> >
> > And if you have to check for those two things you're gonna write more
> > code than just ifuping lo would be.
> >
> > Maybe you can shed some more light on how it makes life at Google
> > easier for you? Or someone else can enlighten me?
> >
> > I don't have much practical experience with namespaces, but the more
> > I think about it the more pointless it seems.  
> 
> Thanks for the input.
> 
> Sorry, I was on vacation and couldn't process this response earlier.
> 
> There are two things that this patch is providing and let me understand the
> objections well.
> 
> (a) Bring up lo by default
> (b) sysctl to protect the legacy behavior
> 
> Frankly we really dont have any legacy-behavior use case and one can
> bring it back to life by just doing 'ifdown lo' if necessary.

If use cases depending on legacy behavior exist we are just trading the
ifup in one case for an ifdown in another.

Unless we can dispel the notion that sand-boxing by lo down is a
legitimate use case IMO we're just adding complexity and growing
a sysctl for something that can be trivially handled from user space.

> Most of
> the use cases involve using networking anyways. My belief was that we
> need to protect legacy behavior and hence went lengths to add sysctl
> to protect it. If we are OK not to have it, I'm more than happy to
> remove the sysctl and just have the 3 line patch to bring loopback up.
> 
> If legacy-behavior is a concern (which I thought generally is), then
> either we can have the sysctl to have it around to protect it (the
> current implementation) but if we prefer to have kernel-command-line
> instead of sysctl that defaults to legacy behavior but if provided, we
> can set it UP by default during boot (or the other way around)?
> 
> My primary motive is (a) while (b) is just a side-effect which we can
> get away if deemed unnecessary.
On Tue, Dec 1, 2020 at 6:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 1 Dec 2020 12:24:38 -0800 Mahesh Bandewar (महेश बंडेवार) wrote:
> > On Thu, Nov 19, 2020 at 8:56 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > Do you have more details on what the use cases are that expect no
> > > networking?
> > >
> > > TBH I don't get the utility of this knob. If you want to write vaguely
> > > portable software you have to assume the knob won't be useful, because
> > > either (a) kernel may be old, or (b) you shouldn't assume to own the
> > > sysctls and this is a global one (what if an application spawns that
> > > expects legacy behavior?)
> > >
> > > And if you have to check for those two things you're gonna write more
> > > code than just ifuping lo would be.
> > >
> > > Maybe you can shed some more light on how it makes life at Google
> > > easier for you? Or someone else can enlighten me?
> > >
> > > I don't have much practical experience with namespaces, but the more
> > > I think about it the more pointless it seems.
> >
> > Thanks for the input.
> >
> > Sorry, I was on vacation and couldn't process this response earlier.
> >
> > There are two things that this patch is providing and let me understand the
> > objections well.
> >
> > (a) Bring up lo by default
> > (b) sysctl to protect the legacy behavior
> >
> > Frankly we really dont have any legacy-behavior use case and one can
> > bring it back to life by just doing 'ifdown lo' if necessary.
>
> If use cases depending on legacy behavior exist we are just trading the
> ifup in one case for an ifdown in another.
>
Yes, I would agree with this if the use-cases are 50/50 but I would
say it's more like 99/1 distribution (99% of the time if not higher
times lo is required to be UP and probably 1% of the time or less it's
 down)

> Unless we can dispel the notion that sand-boxing by lo down is a
> legitimate use case IMO we're just adding complexity and growing
> a sysctl for something that can be trivially handled from user space.
>
OK, I can remove the sysctl and just have the 3 line patch.



> > Most of
> > the use cases involve using networking anyways. My belief was that we
> > need to protect legacy behavior and hence went lengths to add sysctl
> > to protect it. If we are OK not to have it, I'm more than happy to
> > remove the sysctl and just have the 3 line patch to bring loopback up.
> >
> > If legacy-behavior is a concern (which I thought generally is), then
> > either we can have the sysctl to have it around to protect it (the
> > current implementation) but if we prefer to have kernel-command-line
> > instead of sysctl that defaults to legacy behavior but if provided, we
> > can set it UP by default during boot (or the other way around)?
> >
> > My primary motive is (a) while (b) is just a side-effect which we can
> > get away if deemed unnecessary.
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index f2ab8a5b6a4b..6902232ff57a 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -268,6 +268,17 @@  Maximum number of microseconds in one NAPI polling cycle. Polling
 will exit when either netdev_budget_usecs have elapsed during the
 poll cycle or the number of packets processed reaches netdev_budget.
 
+netdev_loopback_state
+---------------------
+
+Controls the loopback device initial state for any new network namespaces. By
+default, we keep the initial state as DOWN.
+
+If set to 1, the loopback device will be brought UP during namespace creation.
+This will only apply to all new network namespaces.
+
+Default : 0  (for compatibility reasons)
+
 netdev_max_backlog
 ------------------
 
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index a1c77cc00416..76dc92ac65a2 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -219,6 +219,13 @@  static __net_init int loopback_net_init(struct net *net)
 
 	BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
 	net->loopback_dev = dev;
+
+	if (sysctl_netdev_loopback_state) {
+		/* Bring loopback device UP */
+		rtnl_lock();
+		dev_open(dev, NULL);
+		rtnl_unlock();
+	}
 	return 0;
 
 out_free_netdev:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7ce648a564f7..27c0a7e8a8ea 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -625,6 +625,7 @@  struct netdev_queue {
 
 extern int sysctl_fb_tunnels_only_for_init_net;
 extern int sysctl_devconf_inherit_init_net;
+extern int sysctl_netdev_loopback_state;
 
 /*
  * sysctl_fb_tunnels_only_for_init_net == 0 : For all netns
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index d86d8d11cfe4..d2cf435f5991 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -35,6 +35,11 @@  static int net_msg_warn;	/* Unused, but still a sysctl */
 int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
 EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);
 
+/* 0 - default (backward compatible) state: DOWN by default
+ * 1 - UP by default (for all new network namespaces)
+ */
+int sysctl_netdev_loopback_state __read_mostly;
+
 /* 0 - Keep current behavior:
  *     IPv4: inherit all current settings from init_net
  *     IPv6: reset all settings to default
@@ -507,6 +512,15 @@  static struct ctl_table net_core_table[] = {
 		.proc_handler	= set_default_qdisc
 	},
 #endif
+	{
+		.procname	= "netdev_loopback_state",
+		.data		= &sysctl_netdev_loopback_state,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE
+	},
 #endif /* CONFIG_NET */
 	{
 		.procname	= "netdev_budget",