Message ID | 20221028033342.173528-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] can: af_can: fix NULL pointer dereference in can_rx_register() | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 9 of 9 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
Hello, On 28.10.22 05:33, Zhengchao Shao wrote: > It causes NULL pointer dereference when testing as following: > (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. > (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan > link device, and bind vxcan device to bond device (can also use > ifenslave command to bind vxcan device to bond device). > (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. > (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. > > The bond device invokes the can-raw protocol registration interface to > receive CAN packets. However, ml_priv is not allocated to the dev, > dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, > it will occur the NULL pointer dereference issue. I can see the problem and see that the patch makes sense for can_rx_register(). But for me the problem seems to be located in the bonding device. A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv and dev->ml_priv_type set correctly. I'm not sure if a bonding device does the right thing by just 'claiming' to be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of being a CAN device and taking care of ml_priv specifics. This might also be the case in other ml_priv use cases. Would it probably make sense to blacklist CAN devices in bonding devices? Thanks & best regards, Oliver > > The following is the stack information: > BUG: kernel NULL pointer dereference, address: 0000000000000008 > PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > RIP: 0010:can_rx_register+0x12d/0x1e0 > Call Trace: > <TASK> > raw_enable_filters+0x8d/0x120 > raw_enable_allfilters+0x3b/0x130 > raw_bind+0x118/0x4f0 > __sys_bind+0x163/0x1a0 > __x64_sys_bind+0x1e/0x30 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > </TASK> > > Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > net/can/af_can.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/can/af_can.c b/net/can/af_can.c > index 9503ab10f9b8..ef2697f3ebcb 100644 > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, > > /* insert new receiver (dev,canid,mask) -> (func,data) */ > > - if (dev && dev->type != ARPHRD_CAN) > + if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN)) > return -ENODEV; > > if (dev && !net_eq(net, dev_net(dev)))
On 28.10.2022 11:33:42, Zhengchao Shao wrote: > It causes NULL pointer dereference when testing as following: > (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. > (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan > link device, and bind vxcan device to bond device (can also use > ifenslave command to bind vxcan device to bond device). > (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. > (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. > > The bond device invokes the can-raw protocol registration interface to > receive CAN packets. However, ml_priv is not allocated to the dev, > dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, > it will occur the NULL pointer dereference issue. > > The following is the stack information: > BUG: kernel NULL pointer dereference, address: 0000000000000008 > PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > RIP: 0010:can_rx_register+0x12d/0x1e0 > Call Trace: > <TASK> > raw_enable_filters+0x8d/0x120 > raw_enable_allfilters+0x3b/0x130 > raw_bind+0x118/0x4f0 > __sys_bind+0x163/0x1a0 > __x64_sys_bind+0x1e/0x30 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > </TASK> > > Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > net/can/af_can.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/can/af_can.c b/net/can/af_can.c > index 9503ab10f9b8..ef2697f3ebcb 100644 > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, > > /* insert new receiver (dev,canid,mask) -> (func,data) */ > > - if (dev && dev->type != ARPHRD_CAN) > + if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN)) You can use the helper function: can_get_ml_priv() instead of open coding the check. Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> regards, Marc
On 28.10.2022 09:13:09, Oliver Hartkopp wrote: > Hello, > > On 28.10.22 05:33, Zhengchao Shao wrote: > > It causes NULL pointer dereference when testing as following: > > (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. > > (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan > > link device, and bind vxcan device to bond device (can also use > > ifenslave command to bind vxcan device to bond device). > > (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. > > (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. > > > > The bond device invokes the can-raw protocol registration interface to > > receive CAN packets. However, ml_priv is not allocated to the dev, > > dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, > > it will occur the NULL pointer dereference issue. > > I can see the problem and see that the patch makes sense for > can_rx_register(). > > But for me the problem seems to be located in the bonding device. > > A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv > and dev->ml_priv_type set correctly. > > I'm not sure if a bonding device does the right thing by just 'claiming' to > be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of > being a CAN device and taking care of ml_priv specifics. > > This might also be the case in other ml_priv use cases. > > Would it probably make sense to blacklist CAN devices in bonding devices? NACK - We had this discussion 2.5 years ago: | https://lore.kernel.org/all/00000000000030dddb059c562a3f@google.com | https://lore.kernel.org/all/20200130133046.2047-1-socketcan@hartkopp.net ...and davem pointed out: | https://lore.kernel.org/all/20200226.202326.295871777946911500.davem@davemloft.net On 26.02.2020 20:23:26, David Miller wrote: [...] > What I don't get is why the PF_CAN is blindly dereferencing a device > assuming what is behind bond_dev->ml_priv. > > If it assumes a device it access is CAN then it should check the > device by comparing the netdev_ops or via some other means. > > This restriction seems arbitrary. With the addition of struct net_device::ml_priv_type in 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device"), what davem requested is now possible. Marc
On 2022/10/28 15:13, Oliver Hartkopp wrote: > Hello, > > On 28.10.22 05:33, Zhengchao Shao wrote: >> It causes NULL pointer dereference when testing as following: >> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. >> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan >> link device, and bind vxcan device to bond device (can also use >> ifenslave command to bind vxcan device to bond device). >> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. >> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. >> >> The bond device invokes the can-raw protocol registration interface to >> receive CAN packets. However, ml_priv is not allocated to the dev, >> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, >> it will occur the NULL pointer dereference issue. > > I can see the problem and see that the patch makes sense for > can_rx_register(). > > But for me the problem seems to be located in the bonding device. > > A CAN interface with dev->type == ARPHRD_CAN *always* has the > dev->ml_priv and dev->ml_priv_type set correctly. > > I'm not sure if a bonding device does the right thing by just 'claiming' > to be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking > care of being a CAN device and taking care of ml_priv specifics. > > This might also be the case in other ml_priv use cases. > > Would it probably make sense to blacklist CAN devices in bonding devices? > > Thanks & best regards, > Oliver > Hi Oliver: Thank you for your review. The bond device inherits the type of the slave device, but the bond device does not allocate ml_priv. I can try to add ARPHRD_CAN to the blocklist of slave_dev. But I was wondering, could there be other scenes with the same problem? Zhengchao Shao >> >> The following is the stack information: >> BUG: kernel NULL pointer dereference, address: 0000000000000008 >> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0 >> Oops: 0000 [#1] PREEMPT SMP >> RIP: 0010:can_rx_register+0x12d/0x1e0 >> Call Trace: >> <TASK> >> raw_enable_filters+0x8d/0x120 >> raw_enable_allfilters+0x3b/0x130 >> raw_bind+0x118/0x4f0 >> __sys_bind+0x163/0x1a0 >> __x64_sys_bind+0x1e/0x30 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> </TASK> >> >> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the >> struct net_device") >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> net/can/af_can.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/can/af_can.c b/net/can/af_can.c >> index 9503ab10f9b8..ef2697f3ebcb 100644 >> --- a/net/can/af_can.c >> +++ b/net/can/af_can.c >> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct >> net_device *dev, canid_t can_id, >> /* insert new receiver (dev,canid,mask) -> (func,data) */ >> - if (dev && dev->type != ARPHRD_CAN) >> + if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != >> ML_PRIV_CAN)) >> return -ENODEV; >> if (dev && !net_eq(net, dev_net(dev)))
On 2022/10/28 15:33, Marc Kleine-Budde wrote: > On 28.10.2022 11:33:42, Zhengchao Shao wrote: >> It causes NULL pointer dereference when testing as following: >> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. >> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan >> link device, and bind vxcan device to bond device (can also use >> ifenslave command to bind vxcan device to bond device). >> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. >> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. >> >> The bond device invokes the can-raw protocol registration interface to >> receive CAN packets. However, ml_priv is not allocated to the dev, >> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, >> it will occur the NULL pointer dereference issue. >> >> The following is the stack information: >> BUG: kernel NULL pointer dereference, address: 0000000000000008 >> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0 >> Oops: 0000 [#1] PREEMPT SMP >> RIP: 0010:can_rx_register+0x12d/0x1e0 >> Call Trace: >> <TASK> >> raw_enable_filters+0x8d/0x120 >> raw_enable_allfilters+0x3b/0x130 >> raw_bind+0x118/0x4f0 >> __sys_bind+0x163/0x1a0 >> __x64_sys_bind+0x1e/0x30 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> </TASK> >> >> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> net/can/af_can.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/can/af_can.c b/net/can/af_can.c >> index 9503ab10f9b8..ef2697f3ebcb 100644 >> --- a/net/can/af_can.c >> +++ b/net/can/af_can.c >> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, >> >> /* insert new receiver (dev,canid,mask) -> (func,data) */ >> >> - if (dev && dev->type != ARPHRD_CAN) >> + if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN)) > > You can use the helper function: can_get_ml_priv() instead of open > coding the check. > > Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> > > regards, > Marc > Hi Marc: Thank you for your review. And I will fix it in V2. Zhengchao Shao
On 28.10.22 09:46, Marc Kleine-Budde wrote: > On 28.10.2022 09:13:09, Oliver Hartkopp wrote: >> Hello, >> >> On 28.10.22 05:33, Zhengchao Shao wrote: >>> It causes NULL pointer dereference when testing as following: >>> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. >>> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan >>> link device, and bind vxcan device to bond device (can also use >>> ifenslave command to bind vxcan device to bond device). >>> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. >>> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. >>> >>> The bond device invokes the can-raw protocol registration interface to >>> receive CAN packets. However, ml_priv is not allocated to the dev, >>> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, >>> it will occur the NULL pointer dereference issue. >> >> I can see the problem and see that the patch makes sense for >> can_rx_register(). >> >> But for me the problem seems to be located in the bonding device. >> >> A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv >> and dev->ml_priv_type set correctly. >> >> I'm not sure if a bonding device does the right thing by just 'claiming' to >> be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of >> being a CAN device and taking care of ml_priv specifics. >> >> This might also be the case in other ml_priv use cases. >> >> Would it probably make sense to blacklist CAN devices in bonding devices? > > NACK - We had this discussion 2.5 years ago: > > | https://lore.kernel.org/all/00000000000030dddb059c562a3f@google.com > | https://lore.kernel.org/all/20200130133046.2047-1-socketcan@hartkopp.net > > ...and davem pointed out: > > | https://lore.kernel.org/all/20200226.202326.295871777946911500.davem@davemloft.net > > On 26.02.2020 20:23:26, David Miller wrote: > [...] >> What I don't get is why the PF_CAN is blindly dereferencing a device >> assuming what is behind bond_dev->ml_priv. >> >> If it assumes a device it access is CAN then it should check the >> device by comparing the netdev_ops or via some other means. >> >> This restriction seems arbitrary. > > With the addition of struct net_device::ml_priv_type in 4e096a18867a > ("net: introduce CAN specific pointer in the struct net_device"), what > davem requested is now possible. > > Marc Oh, thanks for the heads up! Didn't have remembered that specific discussion. Wouldn't we need this check in can_rx_unregister() and maybe can[|fd|xl]_rcv() then too? As all these functions check for ARPHRD_CAN and later access ml_priv. Best regards, Oliver
On 28.10.2022 13:24:36, Oliver Hartkopp wrote: > Didn't have remembered that specific discussion. > > Wouldn't we need this check in can_rx_unregister() and maybe The kernel should not call can_rx_unregister() if can_rx_register() fails, but on the other hand we check for ARPHRD_CAN here, too. > can[|fd|xl]_rcv() then too? > > As all these functions check for ARPHRD_CAN and later access ml_priv. Better safe then sorry. Marc
diff --git a/net/can/af_can.c b/net/can/af_can.c index 9503ab10f9b8..ef2697f3ebcb 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, /* insert new receiver (dev,canid,mask) -> (func,data) */ - if (dev && dev->type != ARPHRD_CAN) + if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN)) return -ENODEV; if (dev && !net_eq(net, dev_net(dev)))
It causes NULL pointer dereference when testing as following: (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket. (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan link device, and bind vxcan device to bond device (can also use ifenslave command to bind vxcan device to bond device). (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket. (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket. The bond device invokes the can-raw protocol registration interface to receive CAN packets. However, ml_priv is not allocated to the dev, dev_rcv_lists is assigned to NULL in can_rx_register(). In this case, it will occur the NULL pointer dereference issue. The following is the stack information: BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:can_rx_register+0x12d/0x1e0 Call Trace: <TASK> raw_enable_filters+0x8d/0x120 raw_enable_allfilters+0x3b/0x130 raw_bind+0x118/0x4f0 __sys_bind+0x163/0x1a0 __x64_sys_bind+0x1e/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- net/can/af_can.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)