diff mbox series

[net] can: af_can: fix NULL pointer dereference in can_rx_register()

Message ID 20221028033342.173528-1-shaozhengchao@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] can: af_can: fix NULL pointer dereference in can_rx_register() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Zhengchao Shao Oct. 28, 2022, 3:33 a.m. UTC
It causes NULL pointer dereference when testing as following:
(a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
(b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
    link device, and bind vxcan device to bond device (can also use
    ifenslave command to bind vxcan device to bond device).
(c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
(d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.

The bond device invokes the can-raw protocol registration interface to
receive CAN packets. However, ml_priv is not allocated to the dev,
dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
it will occur the NULL pointer dereference issue.

The following is the stack information:
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:can_rx_register+0x12d/0x1e0
Call Trace:
<TASK>
raw_enable_filters+0x8d/0x120
raw_enable_allfilters+0x3b/0x130
raw_bind+0x118/0x4f0
__sys_bind+0x163/0x1a0
__x64_sys_bind+0x1e/0x30
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>

Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
---
 net/can/af_can.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Oliver Hartkopp Oct. 28, 2022, 7:13 a.m. UTC | #1
Hello,

On 28.10.22 05:33, Zhengchao Shao wrote:
> It causes NULL pointer dereference when testing as following:
> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
>      link device, and bind vxcan device to bond device (can also use
>      ifenslave command to bind vxcan device to bond device).
> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
> 
> The bond device invokes the can-raw protocol registration interface to
> receive CAN packets. However, ml_priv is not allocated to the dev,
> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
> it will occur the NULL pointer dereference issue.

I can see the problem and see that the patch makes sense for 
can_rx_register().

But for me the problem seems to be located in the bonding device.

A CAN interface with dev->type == ARPHRD_CAN *always* has the 
dev->ml_priv and dev->ml_priv_type set correctly.

I'm not sure if a bonding device does the right thing by just 'claiming' 
to be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking 
care of being a CAN device and taking care of ml_priv specifics.

This might also be the case in other ml_priv use cases.

Would it probably make sense to blacklist CAN devices in bonding devices?

Thanks & best regards,
Oliver

> 
> The following is the stack information:
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP
> RIP: 0010:can_rx_register+0x12d/0x1e0
> Call Trace:
> <TASK>
> raw_enable_filters+0x8d/0x120
> raw_enable_allfilters+0x3b/0x130
> raw_bind+0x118/0x4f0
> __sys_bind+0x163/0x1a0
> __x64_sys_bind+0x1e/0x30
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
> 
> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>   net/can/af_can.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 9503ab10f9b8..ef2697f3ebcb 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id,
>   
>   	/* insert new receiver  (dev,canid,mask) -> (func,data) */
>   
> -	if (dev && dev->type != ARPHRD_CAN)
> +	if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN))
>   		return -ENODEV;
>   
>   	if (dev && !net_eq(net, dev_net(dev)))
Marc Kleine-Budde Oct. 28, 2022, 7:33 a.m. UTC | #2
On 28.10.2022 11:33:42, Zhengchao Shao wrote:
> It causes NULL pointer dereference when testing as following:
> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
>     link device, and bind vxcan device to bond device (can also use
>     ifenslave command to bind vxcan device to bond device).
> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
> 
> The bond device invokes the can-raw protocol registration interface to
> receive CAN packets. However, ml_priv is not allocated to the dev,
> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
> it will occur the NULL pointer dereference issue.
> 
> The following is the stack information:
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP
> RIP: 0010:can_rx_register+0x12d/0x1e0
> Call Trace:
> <TASK>
> raw_enable_filters+0x8d/0x120
> raw_enable_allfilters+0x3b/0x130
> raw_bind+0x118/0x4f0
> __sys_bind+0x163/0x1a0
> __x64_sys_bind+0x1e/0x30
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
> 
> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>  net/can/af_can.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 9503ab10f9b8..ef2697f3ebcb 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id,
>  
>  	/* insert new receiver  (dev,canid,mask) -> (func,data) */
>  
> -	if (dev && dev->type != ARPHRD_CAN)
> +	if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN))

You can use the helper function: can_get_ml_priv() instead of open
coding the check.

Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>

regards,
Marc
Marc Kleine-Budde Oct. 28, 2022, 7:46 a.m. UTC | #3
On 28.10.2022 09:13:09, Oliver Hartkopp wrote:
> Hello,
> 
> On 28.10.22 05:33, Zhengchao Shao wrote:
> > It causes NULL pointer dereference when testing as following:
> > (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
> > (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
> >      link device, and bind vxcan device to bond device (can also use
> >      ifenslave command to bind vxcan device to bond device).
> > (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
> > (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
> > 
> > The bond device invokes the can-raw protocol registration interface to
> > receive CAN packets. However, ml_priv is not allocated to the dev,
> > dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
> > it will occur the NULL pointer dereference issue.
> 
> I can see the problem and see that the patch makes sense for
> can_rx_register().
> 
> But for me the problem seems to be located in the bonding device.
> 
> A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv
> and dev->ml_priv_type set correctly.
> 
> I'm not sure if a bonding device does the right thing by just 'claiming' to
> be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of
> being a CAN device and taking care of ml_priv specifics.
> 
> This might also be the case in other ml_priv use cases.
> 
> Would it probably make sense to blacklist CAN devices in bonding devices?

NACK - We had this discussion 2.5 years ago:

| https://lore.kernel.org/all/00000000000030dddb059c562a3f@google.com
| https://lore.kernel.org/all/20200130133046.2047-1-socketcan@hartkopp.net

...and davem pointed out:

| https://lore.kernel.org/all/20200226.202326.295871777946911500.davem@davemloft.net

On 26.02.2020 20:23:26, David Miller wrote:
[...]
> What I don't get is why the PF_CAN is blindly dereferencing a device
> assuming what is behind bond_dev->ml_priv.
> 
> If it assumes a device it access is CAN then it should check the
> device by comparing the netdev_ops or via some other means.
> 
> This restriction seems arbitrary.

With the addition of struct net_device::ml_priv_type in 4e096a18867a
("net: introduce CAN specific pointer in the struct net_device"), what
davem requested is now possible.

Marc
Zhengchao Shao Oct. 28, 2022, 7:50 a.m. UTC | #4
On 2022/10/28 15:13, Oliver Hartkopp wrote:
> Hello,
> 
> On 28.10.22 05:33, Zhengchao Shao wrote:
>> It causes NULL pointer dereference when testing as following:
>> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
>> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
>>      link device, and bind vxcan device to bond device (can also use
>>      ifenslave command to bind vxcan device to bond device).
>> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
>> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
>>
>> The bond device invokes the can-raw protocol registration interface to
>> receive CAN packets. However, ml_priv is not allocated to the dev,
>> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
>> it will occur the NULL pointer dereference issue.
> 
> I can see the problem and see that the patch makes sense for 
> can_rx_register().
> 
> But for me the problem seems to be located in the bonding device.
> 
> A CAN interface with dev->type == ARPHRD_CAN *always* has the 
> dev->ml_priv and dev->ml_priv_type set correctly.
> 
> I'm not sure if a bonding device does the right thing by just 'claiming' 
> to be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking 
> care of being a CAN device and taking care of ml_priv specifics.
> 
> This might also be the case in other ml_priv use cases.
> 
> Would it probably make sense to blacklist CAN devices in bonding devices?
> 
> Thanks & best regards,
> Oliver
> 

Hi Oliver:
	Thank you for your review. The bond device inherits the type of
the slave device, but the bond device does not allocate ml_priv. I can
try to add ARPHRD_CAN to the blocklist of slave_dev. But I was
wondering, could there be other scenes with the same problem?

Zhengchao Shao
>>
>> The following is the stack information:
>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
>> Oops: 0000 [#1] PREEMPT SMP
>> RIP: 0010:can_rx_register+0x12d/0x1e0
>> Call Trace:
>> <TASK>
>> raw_enable_filters+0x8d/0x120
>> raw_enable_allfilters+0x3b/0x130
>> raw_bind+0x118/0x4f0
>> __sys_bind+0x163/0x1a0
>> __x64_sys_bind+0x1e/0x30
>> do_syscall_64+0x35/0x80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> </TASK>
>>
>> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the 
>> struct net_device")
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>   net/can/af_can.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/can/af_can.c b/net/can/af_can.c
>> index 9503ab10f9b8..ef2697f3ebcb 100644
>> --- a/net/can/af_can.c
>> +++ b/net/can/af_can.c
>> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct 
>> net_device *dev, canid_t can_id,
>>       /* insert new receiver  (dev,canid,mask) -> (func,data) */
>> -    if (dev && dev->type != ARPHRD_CAN)
>> +    if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != 
>> ML_PRIV_CAN))
>>           return -ENODEV;
>>       if (dev && !net_eq(net, dev_net(dev)))
Zhengchao Shao Oct. 28, 2022, 8:44 a.m. UTC | #5
On 2022/10/28 15:33, Marc Kleine-Budde wrote:
> On 28.10.2022 11:33:42, Zhengchao Shao wrote:
>> It causes NULL pointer dereference when testing as following:
>> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
>> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
>>      link device, and bind vxcan device to bond device (can also use
>>      ifenslave command to bind vxcan device to bond device).
>> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
>> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
>>
>> The bond device invokes the can-raw protocol registration interface to
>> receive CAN packets. However, ml_priv is not allocated to the dev,
>> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
>> it will occur the NULL pointer dereference issue.
>>
>> The following is the stack information:
>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>> PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
>> Oops: 0000 [#1] PREEMPT SMP
>> RIP: 0010:can_rx_register+0x12d/0x1e0
>> Call Trace:
>> <TASK>
>> raw_enable_filters+0x8d/0x120
>> raw_enable_allfilters+0x3b/0x130
>> raw_bind+0x118/0x4f0
>> __sys_bind+0x163/0x1a0
>> __x64_sys_bind+0x1e/0x30
>> do_syscall_64+0x35/0x80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> </TASK>
>>
>> Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>   net/can/af_can.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/can/af_can.c b/net/can/af_can.c
>> index 9503ab10f9b8..ef2697f3ebcb 100644
>> --- a/net/can/af_can.c
>> +++ b/net/can/af_can.c
>> @@ -450,7 +450,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id,
>>   
>>   	/* insert new receiver  (dev,canid,mask) -> (func,data) */
>>   
>> -	if (dev && dev->type != ARPHRD_CAN)
>> +	if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN))
> 
> You can use the helper function: can_get_ml_priv() instead of open
> coding the check.
> 
> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> regards,
> Marc
> 
Hi Marc:
	Thank you for your review. And I will fix it in V2.

Zhengchao Shao
Oliver Hartkopp Oct. 28, 2022, 11:24 a.m. UTC | #6
On 28.10.22 09:46, Marc Kleine-Budde wrote:
> On 28.10.2022 09:13:09, Oliver Hartkopp wrote:
>> Hello,
>>
>> On 28.10.22 05:33, Zhengchao Shao wrote:
>>> It causes NULL pointer dereference when testing as following:
>>> (a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
>>> (b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
>>>       link device, and bind vxcan device to bond device (can also use
>>>       ifenslave command to bind vxcan device to bond device).
>>> (c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
>>> (d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
>>>
>>> The bond device invokes the can-raw protocol registration interface to
>>> receive CAN packets. However, ml_priv is not allocated to the dev,
>>> dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
>>> it will occur the NULL pointer dereference issue.
>>
>> I can see the problem and see that the patch makes sense for
>> can_rx_register().
>>
>> But for me the problem seems to be located in the bonding device.
>>
>> A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv
>> and dev->ml_priv_type set correctly.
>>
>> I'm not sure if a bonding device does the right thing by just 'claiming' to
>> be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of
>> being a CAN device and taking care of ml_priv specifics.
>>
>> This might also be the case in other ml_priv use cases.
>>
>> Would it probably make sense to blacklist CAN devices in bonding devices?
> 
> NACK - We had this discussion 2.5 years ago:
> 
> | https://lore.kernel.org/all/00000000000030dddb059c562a3f@google.com
> | https://lore.kernel.org/all/20200130133046.2047-1-socketcan@hartkopp.net
> 
> ...and davem pointed out:
> 
> | https://lore.kernel.org/all/20200226.202326.295871777946911500.davem@davemloft.net
> 
> On 26.02.2020 20:23:26, David Miller wrote:
> [...]
>> What I don't get is why the PF_CAN is blindly dereferencing a device
>> assuming what is behind bond_dev->ml_priv.
>>
>> If it assumes a device it access is CAN then it should check the
>> device by comparing the netdev_ops or via some other means.
>>
>> This restriction seems arbitrary.
> 
> With the addition of struct net_device::ml_priv_type in 4e096a18867a
> ("net: introduce CAN specific pointer in the struct net_device"), what
> davem requested is now possible.
> 
> Marc

Oh, thanks for the heads up!

Didn't have remembered that specific discussion.

Wouldn't we need this check in can_rx_unregister() and maybe 
can[|fd|xl]_rcv() then too?

As all these functions check for ARPHRD_CAN and later access ml_priv.

Best regards,
Oliver
Marc Kleine-Budde Oct. 28, 2022, 12:31 p.m. UTC | #7
On 28.10.2022 13:24:36, Oliver Hartkopp wrote:
> Didn't have remembered that specific discussion.
> 
> Wouldn't we need this check in can_rx_unregister() and maybe

The kernel should not call can_rx_unregister() if can_rx_register()
fails, but on the other hand we check for ARPHRD_CAN here, too.

> can[|fd|xl]_rcv() then too?
> 
> As all these functions check for ARPHRD_CAN and later access ml_priv.

Better safe then sorry.

Marc
diff mbox series

Patch

diff --git a/net/can/af_can.c b/net/can/af_can.c
index 9503ab10f9b8..ef2697f3ebcb 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -450,7 +450,7 @@  int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id,
 
 	/* insert new receiver  (dev,canid,mask) -> (func,data) */
 
-	if (dev && dev->type != ARPHRD_CAN)
+	if (dev && (dev->type != ARPHRD_CAN || dev->ml_priv_type != ML_PRIV_CAN))
 		return -ENODEV;
 
 	if (dev && !net_eq(net, dev_net(dev)))