diff mbox series

[net] bondig: Add bond_xdp_check for bond_xdp_xmit in bond_main.c

Message ID 20240916055011.16655-1-jiwonaid0@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] bondig: Add bond_xdp_check for bond_xdp_xmit in bond_main.c | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch warning WARNING: line length of 96 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-16--18-00 (tests: 764)

Commit Message

Jiwon Kim Sept. 16, 2024, 5:50 a.m. UTC
Add bond_xdp_check to ensure the bond interface is in a valid state.

syzbot reported WARNING in bond_xdp_get_xmit_slave.
In bond_xdp_get_xmit_slave, the comment says
/* Should never happen. Mode guarded by bond_xdp_check() */.
However, it does not check the status when entering bond_xdp_xmit.

Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257
Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com>
---
 drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

Comments

Nikolay Aleksandrov Sept. 16, 2024, 8:48 a.m. UTC | #1
On 16/09/2024 08:50, Jiwon Kim wrote:
> Add bond_xdp_check to ensure the bond interface is in a valid state.
> 
> syzbot reported WARNING in bond_xdp_get_xmit_slave.
> In bond_xdp_get_xmit_slave, the comment says
> /* Should never happen. Mode guarded by bond_xdp_check() */.
> However, it does not check the status when entering bond_xdp_xmit.
> 
> Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257
> Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
> Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 

How did you figure the problem is there? Did you take any time to actually
understand it? This patch doesn't fix anything, the warning can be easily
triggered with it. The actual fix is to remove that WARN_ON() altogether
and downgrade the netdev_err() to a ratelimited version. The reason is that
we can always get to a state where at least 1 bond device has xdp program
installed which increases bpf_master_redirect_enabled_key and another bond
device which uses xdpgeneric, then install an ebpf program that simply
returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning.

setup is[1]:
 $ ip l add veth0 type veth peer veth1
 $ ip l add veth3 type veth peer veth4
 $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp
 $ ip l add bond1 type bond # <- rr mode by default, supported by xdp
 $ ip l set veth0 master bond1
 $ ip l set bond1 up
 $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below
 $ ip l set veth3 master bond0
 $ ip l set bond0 up
 $ ip l set veth4 up
 $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet


If you take the time to look at the call stack and the actual code, you'll
see it goes something like (for the xdpgeneric bond slave, veth3):
...
bpf_prog_run_generic_xdp() for veth3
 -> bpf_prog_run_xdp() 
   -> __bpf_prog_run() # return ACT_TX
     -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev)
       -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON()

I've had a patch for awhile now about this and have taken the time to look into it.
I guess it's time to dust it off and send it out for review. :)

Thanks,
 Nik
Jiwon Kim Sept. 17, 2024, 4:26 a.m. UTC | #2
On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 16/09/2024 08:50, Jiwon Kim wrote:
> > Add bond_xdp_check to ensure the bond interface is in a valid state.
> >
> > syzbot reported WARNING in bond_xdp_get_xmit_slave.
> > In bond_xdp_get_xmit_slave, the comment says
> > /* Should never happen. Mode guarded by bond_xdp_check() */.
> > However, it does not check the status when entering bond_xdp_xmit.
> >
> > Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257
> > Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
> > Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com>
> > ---
> >  drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++---------------
> >  1 file changed, 18 insertions(+), 15 deletions(-)
> >
>
> How did you figure the problem is there? Did you take any time to actually
> understand it? This patch doesn't fix anything, the warning can be easily
> triggered with it. The actual fix is to remove that WARN_ON() altogether
> and downgrade the netdev_err() to a ratelimited version. The reason is that
> we can always get to a state where at least 1 bond device has xdp program
> installed which increases bpf_master_redirect_enabled_key and another bond
> device which uses xdpgeneric, then install an ebpf program that simply
> returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning.
>
> setup is[1]:
>  $ ip l add veth0 type veth peer veth1
>  $ ip l add veth3 type veth peer veth4
>  $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp
>  $ ip l add bond1 type bond # <- rr mode by default, supported by xdp
>  $ ip l set veth0 master bond1
>  $ ip l set bond1 up
>  $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below
>  $ ip l set veth3 master bond0
>  $ ip l set bond0 up
>  $ ip l set veth4 up
>  $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet
>
>
> If you take the time to look at the call stack and the actual code, you'll
> see it goes something like (for the xdpgeneric bond slave, veth3):
> ...
> bpf_prog_run_generic_xdp() for veth3
>  -> bpf_prog_run_xdp()
>    -> __bpf_prog_run() # return ACT_TX
>      -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev)
>        -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON()
>
> I've had a patch for awhile now about this and have taken the time to look into it.
> I guess it's time to dust it off and send it out for review. :)
>
> Thanks,
>  Nik

Hi Nikolay,

Thank you for taking the time to provide a detailed setup and call
stack analysis.
Would you be handling the new patch? If you don't mind, may I revise
this patch to

- Replace with net_ratelimit()
- Remove the WARN_ON()
- Update the comment appropriately

Thanks again for your insights and patience.

Sincerely,

Jiwon Kim
Nikolay Aleksandrov Sept. 17, 2024, 7:14 a.m. UTC | #3
On 17/09/2024 07:26, 김지원 wrote:
> On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> On 16/09/2024 08:50, Jiwon Kim wrote:
>>> Add bond_xdp_check to ensure the bond interface is in a valid state.
>>>
>>> syzbot reported WARNING in bond_xdp_get_xmit_slave.
>>> In bond_xdp_get_xmit_slave, the comment says
>>> /* Should never happen. Mode guarded by bond_xdp_check() */.
>>> However, it does not check the status when entering bond_xdp_xmit.
>>>
>>> Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257
>>> Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
>>> Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++---------------
>>>  1 file changed, 18 insertions(+), 15 deletions(-)
>>>
>>
>> How did you figure the problem is there? Did you take any time to actually
>> understand it? This patch doesn't fix anything, the warning can be easily
>> triggered with it. The actual fix is to remove that WARN_ON() altogether
>> and downgrade the netdev_err() to a ratelimited version. The reason is that
>> we can always get to a state where at least 1 bond device has xdp program
>> installed which increases bpf_master_redirect_enabled_key and another bond
>> device which uses xdpgeneric, then install an ebpf program that simply
>> returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning.
>>
>> setup is[1]:
>>  $ ip l add veth0 type veth peer veth1
>>  $ ip l add veth3 type veth peer veth4
>>  $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp
>>  $ ip l add bond1 type bond # <- rr mode by default, supported by xdp
>>  $ ip l set veth0 master bond1
>>  $ ip l set bond1 up
>>  $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below
>>  $ ip l set veth3 master bond0
>>  $ ip l set bond0 up
>>  $ ip l set veth4 up
>>  $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet
>>
>>
>> If you take the time to look at the call stack and the actual code, you'll
>> see it goes something like (for the xdpgeneric bond slave, veth3):
>> ...
>> bpf_prog_run_generic_xdp() for veth3
>>  -> bpf_prog_run_xdp()
>>    -> __bpf_prog_run() # return ACT_TX
>>      -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev)
>>        -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON()
>>
>> I've had a patch for awhile now about this and have taken the time to look into it.
>> I guess it's time to dust it off and send it out for review. :)
>>
>> Thanks,
>>  Nik
> 
> Hi Nikolay,
> 
> Thank you for taking the time to provide a detailed setup and call
> stack analysis.
> Would you be handling the new patch? If you don't mind, may I revise
> this patch to
> 
> - Replace with net_ratelimit()
> - Remove the WARN_ON()
> - Update the comment appropriately
> 
> Thanks again for your insights and patience.
> 
> Sincerely,
> 
> Jiwon Kim

sure, I don't mind, change the patch and I'll review the new one

Cheers,
 Nik
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index bb9c3d6ef435..078c85070b86 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5551,27 +5551,30 @@  bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
 static int bond_xdp_xmit(struct net_device *bond_dev,
 			 int n, struct xdp_frame **frames, u32 flags)
 {
-	int nxmit, err = -ENXIO;
+	struct bonding *bond = netdev_priv(bond_dev);
+	int nxmit = 0, err = -ENXIO;
 
 	rcu_read_lock();
 
-	for (nxmit = 0; nxmit < n; nxmit++) {
-		struct xdp_frame *frame = frames[nxmit];
-		struct xdp_frame *frames1[] = {frame};
-		struct net_device *slave_dev;
-		struct xdp_buff xdp;
+	if (bond_xdp_check(bond)) {
+		for (nxmit = 0; nxmit < n; nxmit++) {
+			struct xdp_frame *frame = frames[nxmit];
+			struct xdp_frame *frames1[] = {frame};
+			struct net_device *slave_dev;
+			struct xdp_buff xdp;
 
-		xdp_convert_frame_to_buff(frame, &xdp);
+			xdp_convert_frame_to_buff(frame, &xdp);
 
-		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
-		if (!slave_dev) {
-			err = -ENXIO;
-			break;
-		}
+			slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+			if (!slave_dev) {
+				err = -ENXIO;
+				break;
+			}
 
-		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
-		if (err < 1)
-			break;
+			err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+			if (err < 1)
+				break;
+		}
 	}
 
 	rcu_read_unlock();