Message ID | 20240916055011.16655-1-jiwonaid0@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bondig: Add bond_xdp_check for bond_xdp_xmit in bond_main.c | expand |
On 16/09/2024 08:50, Jiwon Kim wrote: > Add bond_xdp_check to ensure the bond interface is in a valid state. > > syzbot reported WARNING in bond_xdp_get_xmit_slave. > In bond_xdp_get_xmit_slave, the comment says > /* Should never happen. Mode guarded by bond_xdp_check() */. > However, it does not check the status when entering bond_xdp_xmit. > > Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 > Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") > Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> > --- > drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- > 1 file changed, 18 insertions(+), 15 deletions(-) > How did you figure the problem is there? Did you take any time to actually understand it? This patch doesn't fix anything, the warning can be easily triggered with it. The actual fix is to remove that WARN_ON() altogether and downgrade the netdev_err() to a ratelimited version. The reason is that we can always get to a state where at least 1 bond device has xdp program installed which increases bpf_master_redirect_enabled_key and another bond device which uses xdpgeneric, then install an ebpf program that simply returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning. setup is[1]: $ ip l add veth0 type veth peer veth1 $ ip l add veth3 type veth peer veth4 $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp $ ip l add bond1 type bond # <- rr mode by default, supported by xdp $ ip l set veth0 master bond1 $ ip l set bond1 up $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below $ ip l set veth3 master bond0 $ ip l set bond0 up $ ip l set veth4 up $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet If you take the time to look at the call stack and the actual code, you'll see it goes something like (for the xdpgeneric bond slave, veth3): ... bpf_prog_run_generic_xdp() for veth3 -> bpf_prog_run_xdp() -> __bpf_prog_run() # return ACT_TX -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev) -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON() I've had a patch for awhile now about this and have taken the time to look into it. I guess it's time to dust it off and send it out for review. :) Thanks, Nik
On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 16/09/2024 08:50, Jiwon Kim wrote: > > Add bond_xdp_check to ensure the bond interface is in a valid state. > > > > syzbot reported WARNING in bond_xdp_get_xmit_slave. > > In bond_xdp_get_xmit_slave, the comment says > > /* Should never happen. Mode guarded by bond_xdp_check() */. > > However, it does not check the status when entering bond_xdp_xmit. > > > > Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 > > Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") > > Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> > > --- > > drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- > > 1 file changed, 18 insertions(+), 15 deletions(-) > > > > How did you figure the problem is there? Did you take any time to actually > understand it? This patch doesn't fix anything, the warning can be easily > triggered with it. The actual fix is to remove that WARN_ON() altogether > and downgrade the netdev_err() to a ratelimited version. The reason is that > we can always get to a state where at least 1 bond device has xdp program > installed which increases bpf_master_redirect_enabled_key and another bond > device which uses xdpgeneric, then install an ebpf program that simply > returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning. > > setup is[1]: > $ ip l add veth0 type veth peer veth1 > $ ip l add veth3 type veth peer veth4 > $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp > $ ip l add bond1 type bond # <- rr mode by default, supported by xdp > $ ip l set veth0 master bond1 > $ ip l set bond1 up > $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below > $ ip l set veth3 master bond0 > $ ip l set bond0 up > $ ip l set veth4 up > $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet > > > If you take the time to look at the call stack and the actual code, you'll > see it goes something like (for the xdpgeneric bond slave, veth3): > ... > bpf_prog_run_generic_xdp() for veth3 > -> bpf_prog_run_xdp() > -> __bpf_prog_run() # return ACT_TX > -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev) > -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON() > > I've had a patch for awhile now about this and have taken the time to look into it. > I guess it's time to dust it off and send it out for review. :) > > Thanks, > Nik Hi Nikolay, Thank you for taking the time to provide a detailed setup and call stack analysis. Would you be handling the new patch? If you don't mind, may I revise this patch to - Replace with net_ratelimit() - Remove the WARN_ON() - Update the comment appropriately Thanks again for your insights and patience. Sincerely, Jiwon Kim
On 17/09/2024 07:26, 김지원 wrote: > On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >> >> On 16/09/2024 08:50, Jiwon Kim wrote: >>> Add bond_xdp_check to ensure the bond interface is in a valid state. >>> >>> syzbot reported WARNING in bond_xdp_get_xmit_slave. >>> In bond_xdp_get_xmit_slave, the comment says >>> /* Should never happen. Mode guarded by bond_xdp_check() */. >>> However, it does not check the status when entering bond_xdp_xmit. >>> >>> Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com >>> Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 >>> Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") >>> Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> >>> --- >>> drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- >>> 1 file changed, 18 insertions(+), 15 deletions(-) >>> >> >> How did you figure the problem is there? Did you take any time to actually >> understand it? This patch doesn't fix anything, the warning can be easily >> triggered with it. The actual fix is to remove that WARN_ON() altogether >> and downgrade the netdev_err() to a ratelimited version. The reason is that >> we can always get to a state where at least 1 bond device has xdp program >> installed which increases bpf_master_redirect_enabled_key and another bond >> device which uses xdpgeneric, then install an ebpf program that simply >> returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning. >> >> setup is[1]: >> $ ip l add veth0 type veth peer veth1 >> $ ip l add veth3 type veth peer veth4 >> $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp >> $ ip l add bond1 type bond # <- rr mode by default, supported by xdp >> $ ip l set veth0 master bond1 >> $ ip l set bond1 up >> $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below >> $ ip l set veth3 master bond0 >> $ ip l set bond0 up >> $ ip l set veth4 up >> $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet >> >> >> If you take the time to look at the call stack and the actual code, you'll >> see it goes something like (for the xdpgeneric bond slave, veth3): >> ... >> bpf_prog_run_generic_xdp() for veth3 >> -> bpf_prog_run_xdp() >> -> __bpf_prog_run() # return ACT_TX >> -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev) >> -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON() >> >> I've had a patch for awhile now about this and have taken the time to look into it. >> I guess it's time to dust it off and send it out for review. :) >> >> Thanks, >> Nik > > Hi Nikolay, > > Thank you for taking the time to provide a detailed setup and call > stack analysis. > Would you be handling the new patch? If you don't mind, may I revise > this patch to > > - Replace with net_ratelimit() > - Remove the WARN_ON() > - Update the comment appropriately > > Thanks again for your insights and patience. > > Sincerely, > > Jiwon Kim sure, I don't mind, change the patch and I'll review the new one Cheers, Nik
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index bb9c3d6ef435..078c85070b86 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5551,27 +5551,30 @@ bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp) static int bond_xdp_xmit(struct net_device *bond_dev, int n, struct xdp_frame **frames, u32 flags) { - int nxmit, err = -ENXIO; + struct bonding *bond = netdev_priv(bond_dev); + int nxmit = 0, err = -ENXIO; rcu_read_lock(); - for (nxmit = 0; nxmit < n; nxmit++) { - struct xdp_frame *frame = frames[nxmit]; - struct xdp_frame *frames1[] = {frame}; - struct net_device *slave_dev; - struct xdp_buff xdp; + if (bond_xdp_check(bond)) { + for (nxmit = 0; nxmit < n; nxmit++) { + struct xdp_frame *frame = frames[nxmit]; + struct xdp_frame *frames1[] = {frame}; + struct net_device *slave_dev; + struct xdp_buff xdp; - xdp_convert_frame_to_buff(frame, &xdp); + xdp_convert_frame_to_buff(frame, &xdp); - slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp); - if (!slave_dev) { - err = -ENXIO; - break; - } + slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp); + if (!slave_dev) { + err = -ENXIO; + break; + } - err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags); - if (err < 1) - break; + err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags); + if (err < 1) + break; + } } rcu_read_unlock();
Add bond_xdp_check to ensure the bond interface is in a valid state. syzbot reported WARNING in bond_xdp_get_xmit_slave. In bond_xdp_get_xmit_slave, the comment says /* Should never happen. Mode guarded by bond_xdp_check() */. However, it does not check the status when entering bond_xdp_xmit. Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> --- drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-)