Message ID | 20230312152158.995043-1-syoshida@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bonding: Fix warning in default_device_exit_batch() | expand |
On 12/03/2023 17:21, Shigeru Yoshida wrote: > syzbot reported warning in default_device_exit_batch() like below [1]: > > WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867 > ... > Call Trace: > <TASK> > unregister_netdevice_many net/core/dev.c:10897 [inline] > default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350 > ops_exit_list+0x125/0x170 net/core/net_namespace.c:174 > cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613 > process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390 > worker_thread+0x669/0x1090 kernel/workqueue.c:2537 > kthread+0x2e8/0x3a0 kernel/kthread.c:376 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> > > For bond devices which also has a master device, IFF_SLAVE flag is > cleared at err_undo_flags label in bond_enslave() if it is not > ARPHRD_ETHER type. In this case, __bond_release_one() is not called > when bond_netdev_event() received NETDEV_UNREGISTER event. This > causes the above warning. > > This patch fixes this issue by setting IFF_SLAVE flag at > err_undo_flags label in bond_enslave() if the bond device has a master > device. > The proper way is to check if the bond device had the IFF_SLAVE flag before the ether_setup() call which clears it, and restore it after. > Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") > Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> > Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1] > Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com > Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> > --- > drivers/net/bonding/bond_main.c | 2 ++ > include/net/bonding.h | 5 +++++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 00646aa315c3..1a8b59e1468d 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, > dev_close(bond_dev); > ether_setup(bond_dev); > bond_dev->flags |= IFF_MASTER; > + if (bond_has_master(bond)) > + bond_dev->flags |= IFF_SLAVE; > bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; > } > } > diff --git a/include/net/bonding.h b/include/net/bonding.h > index ea36ab7f9e72..ed0b49501fad 100644 > --- a/include/net/bonding.h > +++ b/include/net/bonding.h > @@ -57,6 +57,11 @@ > > #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) > > +/* master list primitives */ > +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper) > + > +#define bond_has_master(bond) !list_empty(bond_master_list(bond)) > + This is not the proper way to check for a master device. > /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */ > #define bond_first_slave(bond) \ > (bond_has_slaves(bond) ? \ The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should just check if it was present before and restore it after the ether_setup() call. I'll send a fix tomorrow after testing it. Thanks, Nik
Hi Nik, On Sun, Mar 12, 2023 at 10:58:18PM +0200, Nikolay Aleksandrov wrote: > On 12/03/2023 17:21, Shigeru Yoshida wrote: > > syzbot reported warning in default_device_exit_batch() like below [1]: > > > > WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867 > > ... > > Call Trace: > > <TASK> > > unregister_netdevice_many net/core/dev.c:10897 [inline] > > default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350 > > ops_exit_list+0x125/0x170 net/core/net_namespace.c:174 > > cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613 > > process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390 > > worker_thread+0x669/0x1090 kernel/workqueue.c:2537 > > kthread+0x2e8/0x3a0 kernel/kthread.c:376 > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > > </TASK> > > > > For bond devices which also has a master device, IFF_SLAVE flag is > > cleared at err_undo_flags label in bond_enslave() if it is not > > ARPHRD_ETHER type. In this case, __bond_release_one() is not called > > when bond_netdev_event() received NETDEV_UNREGISTER event. This > > causes the above warning. > > > > This patch fixes this issue by setting IFF_SLAVE flag at > > err_undo_flags label in bond_enslave() if the bond device has a master > > device. > > > > The proper way is to check if the bond device had the IFF_SLAVE flag before the > ether_setup() call which clears it, and restore it after. > > > Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") > > Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> > > Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1] > > Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com > > Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> > > --- > > drivers/net/bonding/bond_main.c | 2 ++ > > include/net/bonding.h | 5 +++++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > > index 00646aa315c3..1a8b59e1468d 100644 > > --- a/drivers/net/bonding/bond_main.c > > +++ b/drivers/net/bonding/bond_main.c > > @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, > > dev_close(bond_dev); > > ether_setup(bond_dev); > > bond_dev->flags |= IFF_MASTER; > > + if (bond_has_master(bond)) > > + bond_dev->flags |= IFF_SLAVE; > > bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; > > } > > } > > diff --git a/include/net/bonding.h b/include/net/bonding.h > > index ea36ab7f9e72..ed0b49501fad 100644 > > --- a/include/net/bonding.h > > +++ b/include/net/bonding.h > > @@ -57,6 +57,11 @@ > > > > #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) > > > > +/* master list primitives */ > > +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper) > > + > > +#define bond_has_master(bond) !list_empty(bond_master_list(bond)) > > + > > This is not the proper way to check for a master device. > > > /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */ > > #define bond_first_slave(bond) \ > > (bond_has_slaves(bond) ? \ > > The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should > just check if it was present before and restore it after the ether_setup() call. Thank you so much for your comment! I understand your point, and agree that your approach must resolve the issue. BTW, do you mean there is a case where a device has IFF_SLAVE flag but the upper list is empty? I thought a device with IFF_SLAVE flag has a master device in the upper list (that is why I took the above way.) Thanks, Shigeru > > I'll send a fix tomorrow after testing it. > > Thanks, > Nik >
On 13/03/2023 11:35, Shigeru Yoshida wrote: > Hi Nik, > > On Sun, Mar 12, 2023 at 10:58:18PM +0200, Nikolay Aleksandrov wrote: >> On 12/03/2023 17:21, Shigeru Yoshida wrote: >>> syzbot reported warning in default_device_exit_batch() like below [1]: >>> >>> WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867 >>> ... >>> Call Trace: >>> <TASK> >>> unregister_netdevice_many net/core/dev.c:10897 [inline] >>> default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350 >>> ops_exit_list+0x125/0x170 net/core/net_namespace.c:174 >>> cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613 >>> process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390 >>> worker_thread+0x669/0x1090 kernel/workqueue.c:2537 >>> kthread+0x2e8/0x3a0 kernel/kthread.c:376 >>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 >>> </TASK> >>> >>> For bond devices which also has a master device, IFF_SLAVE flag is >>> cleared at err_undo_flags label in bond_enslave() if it is not >>> ARPHRD_ETHER type. In this case, __bond_release_one() is not called >>> when bond_netdev_event() received NETDEV_UNREGISTER event. This >>> causes the above warning. >>> >>> This patch fixes this issue by setting IFF_SLAVE flag at >>> err_undo_flags label in bond_enslave() if the bond device has a master >>> device. >>> >> >> The proper way is to check if the bond device had the IFF_SLAVE flag before the >> ether_setup() call which clears it, and restore it after. >> >>> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") >>> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> >>> Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1] >>> Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com >>> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> >>> --- >>> drivers/net/bonding/bond_main.c | 2 ++ >>> include/net/bonding.h | 5 +++++ >>> 2 files changed, 7 insertions(+) >>> >>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >>> index 00646aa315c3..1a8b59e1468d 100644 >>> --- a/drivers/net/bonding/bond_main.c >>> +++ b/drivers/net/bonding/bond_main.c >>> @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, >>> dev_close(bond_dev); >>> ether_setup(bond_dev); >>> bond_dev->flags |= IFF_MASTER; >>> + if (bond_has_master(bond)) >>> + bond_dev->flags |= IFF_SLAVE; >>> bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; >>> } >>> } >>> diff --git a/include/net/bonding.h b/include/net/bonding.h >>> index ea36ab7f9e72..ed0b49501fad 100644 >>> --- a/include/net/bonding.h >>> +++ b/include/net/bonding.h >>> @@ -57,6 +57,11 @@ >>> >>> #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) >>> >>> +/* master list primitives */ >>> +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper) >>> + >>> +#define bond_has_master(bond) !list_empty(bond_master_list(bond)) >>> + >> >> This is not the proper way to check for a master device. >> >>> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */ >>> #define bond_first_slave(bond) \ >>> (bond_has_slaves(bond) ? \ >> >> The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should >> just check if it was present before and restore it after the ether_setup() call. > > Thank you so much for your comment! I understand your point, and > agree that your approach must resolve the issue. > > BTW, do you mean there is a case where a device has IFF_SLAVE flag but > the upper list is empty? I thought a device with IFF_SLAVE flag has a > master device in the upper list (that is why I took the above way.) > Hi Shigeru, No, that's not what I meant. It's the opposite actually, you may have an upper list but you don't have a "master" device or slave flag set. Yes, you can say that if a device has IFF_SLAVE set, then it must have a master upper device but that's not what you're checking for, you've reversed that logic to check for an upper device instead and assume there's a IFF_SLAVE flag set (which may not be true). For an upper device to be considered a "master" device, it must have the master bool set to true in its netdev_adjacent structure. We already have helpers to check for master devices and to retrieve them, e.g. check netdev_master_upper_dev_get* in net/core/dev.c The most robust way to fix it is to check if the flag was there prior to the ether_setup() call and restore it after, also to leave a nice comment about all of this. :) > Thanks, > Shigeru > Cheers, Nik
Hi Nik, On Mon, Mar 13, 2023 at 12:52:44PM +0200, Nikolay Aleksandrov wrote: > On 13/03/2023 11:35, Shigeru Yoshida wrote: > > Hi Nik, > > > > On Sun, Mar 12, 2023 at 10:58:18PM +0200, Nikolay Aleksandrov wrote: > >> On 12/03/2023 17:21, Shigeru Yoshida wrote: > >>> syzbot reported warning in default_device_exit_batch() like below [1]: > >>> > >>> WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867 > >>> ... > >>> Call Trace: > >>> <TASK> > >>> unregister_netdevice_many net/core/dev.c:10897 [inline] > >>> default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350 > >>> ops_exit_list+0x125/0x170 net/core/net_namespace.c:174 > >>> cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613 > >>> process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390 > >>> worker_thread+0x669/0x1090 kernel/workqueue.c:2537 > >>> kthread+0x2e8/0x3a0 kernel/kthread.c:376 > >>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > >>> </TASK> > >>> > >>> For bond devices which also has a master device, IFF_SLAVE flag is > >>> cleared at err_undo_flags label in bond_enslave() if it is not > >>> ARPHRD_ETHER type. In this case, __bond_release_one() is not called > >>> when bond_netdev_event() received NETDEV_UNREGISTER event. This > >>> causes the above warning. > >>> > >>> This patch fixes this issue by setting IFF_SLAVE flag at > >>> err_undo_flags label in bond_enslave() if the bond device has a master > >>> device. > >>> > >> > >> The proper way is to check if the bond device had the IFF_SLAVE flag before the > >> ether_setup() call which clears it, and restore it after. > >> > >>> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") > >>> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> > >>> Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1] > >>> Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com > >>> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> > >>> --- > >>> drivers/net/bonding/bond_main.c | 2 ++ > >>> include/net/bonding.h | 5 +++++ > >>> 2 files changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > >>> index 00646aa315c3..1a8b59e1468d 100644 > >>> --- a/drivers/net/bonding/bond_main.c > >>> +++ b/drivers/net/bonding/bond_main.c > >>> @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, > >>> dev_close(bond_dev); > >>> ether_setup(bond_dev); > >>> bond_dev->flags |= IFF_MASTER; > >>> + if (bond_has_master(bond)) > >>> + bond_dev->flags |= IFF_SLAVE; > >>> bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; > >>> } > >>> } > >>> diff --git a/include/net/bonding.h b/include/net/bonding.h > >>> index ea36ab7f9e72..ed0b49501fad 100644 > >>> --- a/include/net/bonding.h > >>> +++ b/include/net/bonding.h > >>> @@ -57,6 +57,11 @@ > >>> > >>> #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) > >>> > >>> +/* master list primitives */ > >>> +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper) > >>> + > >>> +#define bond_has_master(bond) !list_empty(bond_master_list(bond)) > >>> + > >> > >> This is not the proper way to check for a master device. > >> > >>> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */ > >>> #define bond_first_slave(bond) \ > >>> (bond_has_slaves(bond) ? \ > >> > >> The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should > >> just check if it was present before and restore it after the ether_setup() call. > > > > Thank you so much for your comment! I understand your point, and > > agree that your approach must resolve the issue. > > > > BTW, do you mean there is a case where a device has IFF_SLAVE flag but > > the upper list is empty? I thought a device with IFF_SLAVE flag has a > > master device in the upper list (that is why I took the above way.) > > > > Hi Shigeru, > No, that's not what I meant. It's the opposite actually, you may have an upper list > but you don't have a "master" device or slave flag set. Yes, you can say that if > a device has IFF_SLAVE set, then it must have a master upper device but that's not > what you're checking for, you've reversed that logic to check for an upper device instead > and assume there's a IFF_SLAVE flag set (which may not be true). > For an upper device to be considered a "master" device, it must have the master bool set to > true in its netdev_adjacent structure. We already have helpers to check for master devices > and to retrieve them, e.g. check netdev_master_upper_dev_get* in net/core/dev.c > > The most robust way to fix it is to check if the flag was there prior to the ether_setup() call > and restore it after, also to leave a nice comment about all of this. :) Thanks for kindly explanation. I've now understand why my fix is not sufficient to check a master device. And, yes, the most robust and simple way to fix the issue is to check the flag before it is cleared. Thanks you~ Shigeru > > > Thanks, > > Shigeru > > > > Cheers, > Nik >
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 00646aa315c3..1a8b59e1468d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, dev_close(bond_dev); ether_setup(bond_dev); bond_dev->flags |= IFF_MASTER; + if (bond_has_master(bond)) + bond_dev->flags |= IFF_SLAVE; bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; } } diff --git a/include/net/bonding.h b/include/net/bonding.h index ea36ab7f9e72..ed0b49501fad 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -57,6 +57,11 @@ #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) +/* master list primitives */ +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper) + +#define bond_has_master(bond) !list_empty(bond_master_list(bond)) + /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */ #define bond_first_slave(bond) \ (bond_has_slaves(bond) ? \
syzbot reported warning in default_device_exit_batch() like below [1]: WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867 ... Call Trace: <TASK> unregister_netdevice_many net/core/dev.c:10897 [inline] default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350 ops_exit_list+0x125/0x170 net/core/net_namespace.c:174 cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613 process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390 worker_thread+0x669/0x1090 kernel/workqueue.c:2537 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> For bond devices which also has a master device, IFF_SLAVE flag is cleared at err_undo_flags label in bond_enslave() if it is not ARPHRD_ETHER type. In this case, __bond_release_one() is not called when bond_netdev_event() received NETDEV_UNREGISTER event. This causes the above warning. This patch fixes this issue by setting IFF_SLAVE flag at err_undo_flags label in bond_enslave() if the bond device has a master device. Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1] Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> --- drivers/net/bonding/bond_main.c | 2 ++ include/net/bonding.h | 5 +++++ 2 files changed, 7 insertions(+)