Message ID | 20220711013111.33183-1-duoming@zju.edu.cn (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v6] net: rose: fix null-ptr-deref caused by rose_kill_by_neigh | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/checkpatch | warning | WARNING: line length of 89 exceeds 80 columns |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Mon, 2022-07-11 at 09:31 +0800, Duoming Zhou wrote: > When the link layer connection is broken, the rose->neighbour is > set to null. But rose->neighbour could be used by rose_connection() > and rose_release() later, because there is no synchronization among > them. As a result, the null-ptr-deref bugs will happen. > > One of the null-ptr-deref bugs is shown below: > > (thread 1) | (thread 2) > | rose_connect > rose_kill_by_neigh | lock_sock(sk) > spin_lock_bh(&rose_list_lock) | if (!rose->neighbour) > rose->neighbour = NULL;//(1) | > | rose->neighbour->use++;//(2) > > The rose->neighbour is set to null in position (1) and dereferenced > in position (2). > > The KASAN report triggered by POC is shown below: > > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > ... > RIP: 0010:rose_connect+0x6c2/0xf30 > RSP: 0018:ffff88800ab47d60 EFLAGS: 00000206 > RAX: 0000000000000005 RBX: 000000000000002a RCX: 0000000000000000 > RDX: ffff88800ab38000 RSI: ffff88800ab47e48 RDI: ffff88800ab38309 > RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1001567062 > R10: dfffe91001567063 R11: 1ffff11001567061 R12: 1ffff11000d17cd0 > R13: ffff8880068be680 R14: 0000000000000002 R15: 1ffff11000d17cd0 > ... > Call Trace: > <TASK> > ? __local_bh_enable_ip+0x54/0x80 > ? selinux_netlbl_socket_connect+0x26/0x30 > ? rose_bind+0x5b0/0x5b0 > __sys_connect+0x216/0x280 > __x64_sys_connect+0x71/0x80 > do_syscall_64+0x43/0x90 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > This patch adds lock_sock() in rose_kill_by_neigh() in order to > synchronize with rose_connect() and rose_release(). Then, changing > type of 'neighbour->use' from unsigned short to atomic_t in order to > mitigate race conditions caused by holding different socket lock while > updating 'neighbour->use'. > > Meanwhile, this patch adds sock_hold() protected by rose_list_lock > that could synchronize with rose_remove_socket() in order to mitigate > UAF bug caused by lock_sock() we add. > > What's more, there is no need using rose_neigh_list_lock to protect > rose_kill_by_neigh(). Because we have already used rose_neigh_list_lock > to protect the state change of rose_neigh in rose_link_failed(), which > is well synchronized. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > --- > Changes in v6: > - Change sk_for_each() to sk_for_each_safe(). > - Change type of 'neighbour->use' from unsigned short to atomic_t. > > include/net/rose.h | 2 +- > net/rose/af_rose.c | 19 +++++++++++++------ > net/rose/rose_in.c | 12 ++++++------ > net/rose/rose_route.c | 24 ++++++++++++------------ > net/rose/rose_timer.c | 2 +- > 5 files changed, 33 insertions(+), 26 deletions(-) > > diff --git a/include/net/rose.h b/include/net/rose.h > index 0f0a4ce0fee..d5ddebc556d 100644 > --- a/include/net/rose.h > +++ b/include/net/rose.h > @@ -95,7 +95,7 @@ struct rose_neigh { > ax25_cb *ax25; > struct net_device *dev; > unsigned short count; > - unsigned short use; > + atomic_t use; > unsigned int number; > char restarted; > char dce_mode; > diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c > index bf2d986a6bc..54e7b76c4f3 100644 > --- a/net/rose/af_rose.c > +++ b/net/rose/af_rose.c > @@ -163,16 +163,23 @@ static void rose_remove_socket(struct sock *sk) > void rose_kill_by_neigh(struct rose_neigh *neigh) > { > struct sock *s; > + struct hlist_node *tmp; > > spin_lock_bh(&rose_list_lock); > - sk_for_each(s, &rose_list) { > + sk_for_each_safe(s, tmp, &rose_list) { > struct rose_sock *rose = rose_sk(s); > > + sock_hold(s); > + spin_unlock_bh(&rose_list_lock); > + lock_sock(s); > if (rose->neighbour == neigh) { > rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); > - rose->neighbour->use--; > + atomic_dec(&rose->neighbour->use); > rose->neighbour = NULL; > } > + release_sock(s); > + sock_put(s); I'm sorry, this does not work. At this point both 's' and 'tmp' sockets can be freed and reused. Both iterators are not valid anymore when you acquire the 'rose_list_lock' later. I really think you should resort to something similar to the following (completelly untested, just to give an idea). In any case it would be better to split this change in 2 separate patches: the first patch replaces 'int use;' with an antomic_t and the 2nd one addresses the race you describe above. --- diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index bf2d986a6bc3..27b1027aaedf 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -156,25 +156,45 @@ static void rose_remove_socket(struct sock *sk) spin_unlock_bh(&rose_list_lock); } +static DEFINE_MUTEX(kill_lock); + /* * Kill all bound sockets on a broken link layer connection to a * particular neighbour. */ void rose_kill_by_neigh(struct rose_neigh *neigh) { - struct sock *s; + HLIST_HEAD(rose_list_copy); + struct sock *s, *tmp; + + mutex_lock(&kill_lock); spin_lock_bh(&rose_list_lock); sk_for_each(s, &rose_list) { + sock_hold(s); + /* sk_bind_node is apparently unused by rose. Alternatively + * you can add another hlist_node to rose_sock and use it here + */ + sk_add_bind_node(s, &rose_list_copy); + } + spin_unlock_bh(&rose_list_lock); + + hlist_for_each_entry_safe(s, tmp, &rose_list_copy, sk_bind_node) { struct rose_sock *rose = rose_sk(s); + __sk_del_bind_node(s); + lock_sock(s); if (rose->neighbour == neigh) { rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); rose->neighbour = NULL; } + release_sock(s); + + sock_put(s); } - spin_unlock_bh(&rose_list_lock); + + mutex_unlock(&kill_lock); } /* --- /P
Hello, On Tue, 12 Jul 2022 13:00:49 +0200 Paolo Abeni wrote: > On Mon, 2022-07-11 at 09:31 +0800, Duoming Zhou wrote: > > When the link layer connection is broken, the rose->neighbour is > > set to null. But rose->neighbour could be used by rose_connection() > > and rose_release() later, because there is no synchronization among > > them. As a result, the null-ptr-deref bugs will happen. > > > > One of the null-ptr-deref bugs is shown below: > > > > (thread 1) | (thread 2) > > | rose_connect > > rose_kill_by_neigh | lock_sock(sk) > > spin_lock_bh(&rose_list_lock) | if (!rose->neighbour) > > rose->neighbour = NULL;//(1) | > > | rose->neighbour->use++;//(2) > > > > The rose->neighbour is set to null in position (1) and dereferenced > > in position (2). > > > > The KASAN report triggered by POC is shown below: > > > > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > > ... > > RIP: 0010:rose_connect+0x6c2/0xf30 > > RSP: 0018:ffff88800ab47d60 EFLAGS: 00000206 > > RAX: 0000000000000005 RBX: 000000000000002a RCX: 0000000000000000 > > RDX: ffff88800ab38000 RSI: ffff88800ab47e48 RDI: ffff88800ab38309 > > RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1001567062 > > R10: dfffe91001567063 R11: 1ffff11001567061 R12: 1ffff11000d17cd0 > > R13: ffff8880068be680 R14: 0000000000000002 R15: 1ffff11000d17cd0 > > ... > > Call Trace: > > <TASK> > > ? __local_bh_enable_ip+0x54/0x80 > > ? selinux_netlbl_socket_connect+0x26/0x30 > > ? rose_bind+0x5b0/0x5b0 > > __sys_connect+0x216/0x280 > > __x64_sys_connect+0x71/0x80 > > do_syscall_64+0x43/0x90 > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > This patch adds lock_sock() in rose_kill_by_neigh() in order to > > synchronize with rose_connect() and rose_release(). Then, changing > > type of 'neighbour->use' from unsigned short to atomic_t in order to > > mitigate race conditions caused by holding different socket lock while > > updating 'neighbour->use'. > > > > Meanwhile, this patch adds sock_hold() protected by rose_list_lock > > that could synchronize with rose_remove_socket() in order to mitigate > > UAF bug caused by lock_sock() we add. > > > > What's more, there is no need using rose_neigh_list_lock to protect > > rose_kill_by_neigh(). Because we have already used rose_neigh_list_lock > > to protect the state change of rose_neigh in rose_link_failed(), which > > is well synchronized. > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > --- > > Changes in v6: > > - Change sk_for_each() to sk_for_each_safe(). > > - Change type of 'neighbour->use' from unsigned short to atomic_t. > > > > include/net/rose.h | 2 +- > > net/rose/af_rose.c | 19 +++++++++++++------ > > net/rose/rose_in.c | 12 ++++++------ > > net/rose/rose_route.c | 24 ++++++++++++------------ > > net/rose/rose_timer.c | 2 +- > > 5 files changed, 33 insertions(+), 26 deletions(-) > > > > diff --git a/include/net/rose.h b/include/net/rose.h > > index 0f0a4ce0fee..d5ddebc556d 100644 > > --- a/include/net/rose.h > > +++ b/include/net/rose.h > > @@ -95,7 +95,7 @@ struct rose_neigh { > > ax25_cb *ax25; > > struct net_device *dev; > > unsigned short count; > > - unsigned short use; > > + atomic_t use; > > unsigned int number; > > char restarted; > > char dce_mode; > > diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c > > index bf2d986a6bc..54e7b76c4f3 100644 > > --- a/net/rose/af_rose.c > > +++ b/net/rose/af_rose.c > > @@ -163,16 +163,23 @@ static void rose_remove_socket(struct sock *sk) > > void rose_kill_by_neigh(struct rose_neigh *neigh) > > { > > struct sock *s; > > + struct hlist_node *tmp; > > > > spin_lock_bh(&rose_list_lock); > > - sk_for_each(s, &rose_list) { > > + sk_for_each_safe(s, tmp, &rose_list) { > > struct rose_sock *rose = rose_sk(s); > > > > + sock_hold(s); > > + spin_unlock_bh(&rose_list_lock); > > + lock_sock(s); > > if (rose->neighbour == neigh) { > > rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); > > - rose->neighbour->use--; > > + atomic_dec(&rose->neighbour->use); > > rose->neighbour = NULL; > > } > > + release_sock(s); > > + sock_put(s); > > I'm sorry, this does not work. At this point both 's' and 'tmp' sockets > can be freed and reused. Both iterators are not valid anymore when you > acquire the 'rose_list_lock' later. Thank you for your time and reply! But I think both 's' and 'tmp' can not be freed and reused in rose_kill_by_neigh(). Because rose_remove_socket() calls sk_del_node_init() which is protected by rose_list_lock to delete the socket node from the hlist and if sk->sk_refcnt equals to 1, the socket will be deallocated. static void rose_remove_socket(struct sock *sk) { spin_lock_bh(&rose_list_lock); sk_del_node_init(sk); spin_unlock_bh(&rose_list_lock); } https://elixir.bootlin.com/linux/v5.19-rc6/source/net/rose/af_rose.c#L152 Both 's' and 'tmp' in rose_kill_by_neigh() is also protected by rose_list_lock. If the socket is deleted from the hlist, sk_for_each_safe() could not find the socket and the UAF bug could be prevented. If the socket could be found by sk_for_each_safe(), we use sock_hold(s) to increase the refcount of the socket. As a result, the UAF bugs could be prevented. Best regards, Duoming Zhou
On Wed, 2022-07-13 at 15:50 +0800, duoming@zju.edu.cn wrote: > Hello, > > On Tue, 12 Jul 2022 13:00:49 +0200 Paolo Abeni wrote: > > > On Mon, 2022-07-11 at 09:31 +0800, Duoming Zhou wrote: > > > When the link layer connection is broken, the rose->neighbour is > > > set to null. But rose->neighbour could be used by rose_connection() > > > and rose_release() later, because there is no synchronization among > > > them. As a result, the null-ptr-deref bugs will happen. > > > > > > One of the null-ptr-deref bugs is shown below: > > > > > > (thread 1) | (thread 2) > > > | rose_connect > > > rose_kill_by_neigh | lock_sock(sk) > > > spin_lock_bh(&rose_list_lock) | if (!rose->neighbour) > > > rose->neighbour = NULL;//(1) | > > > | rose->neighbour->use++;//(2) > > > > > > The rose->neighbour is set to null in position (1) and dereferenced > > > in position (2). > > > > > > The KASAN report triggered by POC is shown below: > > > > > > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > > > ... > > > RIP: 0010:rose_connect+0x6c2/0xf30 > > > RSP: 0018:ffff88800ab47d60 EFLAGS: 00000206 > > > RAX: 0000000000000005 RBX: 000000000000002a RCX: 0000000000000000 > > > RDX: ffff88800ab38000 RSI: ffff88800ab47e48 RDI: ffff88800ab38309 > > > RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1001567062 > > > R10: dfffe91001567063 R11: 1ffff11001567061 R12: 1ffff11000d17cd0 > > > R13: ffff8880068be680 R14: 0000000000000002 R15: 1ffff11000d17cd0 > > > ... > > > Call Trace: > > > <TASK> > > > ? __local_bh_enable_ip+0x54/0x80 > > > ? selinux_netlbl_socket_connect+0x26/0x30 > > > ? rose_bind+0x5b0/0x5b0 > > > __sys_connect+0x216/0x280 > > > __x64_sys_connect+0x71/0x80 > > > do_syscall_64+0x43/0x90 > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > > This patch adds lock_sock() in rose_kill_by_neigh() in order to > > > synchronize with rose_connect() and rose_release(). Then, changing > > > type of 'neighbour->use' from unsigned short to atomic_t in order to > > > mitigate race conditions caused by holding different socket lock while > > > updating 'neighbour->use'. > > > > > > Meanwhile, this patch adds sock_hold() protected by rose_list_lock > > > that could synchronize with rose_remove_socket() in order to mitigate > > > UAF bug caused by lock_sock() we add. > > > > > > What's more, there is no need using rose_neigh_list_lock to protect > > > rose_kill_by_neigh(). Because we have already used rose_neigh_list_lock > > > to protect the state change of rose_neigh in rose_link_failed(), which > > > is well synchronized. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > > --- > > > Changes in v6: > > > - Change sk_for_each() to sk_for_each_safe(). > > > - Change type of 'neighbour->use' from unsigned short to atomic_t. > > > > > > include/net/rose.h | 2 +- > > > net/rose/af_rose.c | 19 +++++++++++++------ > > > net/rose/rose_in.c | 12 ++++++------ > > > net/rose/rose_route.c | 24 ++++++++++++------------ > > > net/rose/rose_timer.c | 2 +- > > > 5 files changed, 33 insertions(+), 26 deletions(-) > > > > > > diff --git a/include/net/rose.h b/include/net/rose.h > > > index 0f0a4ce0fee..d5ddebc556d 100644 > > > --- a/include/net/rose.h > > > +++ b/include/net/rose.h > > > @@ -95,7 +95,7 @@ struct rose_neigh { > > > ax25_cb *ax25; > > > struct net_device *dev; > > > unsigned short count; > > > - unsigned short use; > > > + atomic_t use; > > > unsigned int number; > > > char restarted; > > > char dce_mode; > > > diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c > > > index bf2d986a6bc..54e7b76c4f3 100644 > > > --- a/net/rose/af_rose.c > > > +++ b/net/rose/af_rose.c > > > @@ -163,16 +163,23 @@ static void rose_remove_socket(struct sock *sk) > > > void rose_kill_by_neigh(struct rose_neigh *neigh) > > > { > > > struct sock *s; > > > + struct hlist_node *tmp; > > > > > > spin_lock_bh(&rose_list_lock); > > > - sk_for_each(s, &rose_list) { > > > + sk_for_each_safe(s, tmp, &rose_list) { > > > struct rose_sock *rose = rose_sk(s); > > > > > > + sock_hold(s); > > > + spin_unlock_bh(&rose_list_lock); > > > + lock_sock(s); > > > if (rose->neighbour == neigh) { > > > rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); > > > - rose->neighbour->use--; > > > + atomic_dec(&rose->neighbour->use); > > > rose->neighbour = NULL; > > > } > > > + release_sock(s); > > > + sock_put(s); > > > > I'm sorry, this does not work. At this point both 's' and 'tmp' sockets > > can be freed and reused. Both iterators are not valid anymore when you > > acquire the 'rose_list_lock' later. > > Thank you for your time and reply! But I think both 's' and 'tmp' can not > be freed and reused in rose_kill_by_neigh(). Because rose_remove_socket() > calls sk_del_node_init() which is protected by rose_list_lock to delete the > socket node from the hlist and if sk->sk_refcnt equals to 1, the socket will > be deallocated. > > static void rose_remove_socket(struct sock *sk) > { > spin_lock_bh(&rose_list_lock); > sk_del_node_init(sk); > spin_unlock_bh(&rose_list_lock); > } > > https://elixir.bootlin.com/linux/v5.19-rc6/source/net/rose/af_rose.c#L152 > > Both 's' and 'tmp' in rose_kill_by_neigh() is also protected by rose_list_lock. The above loop explicitly releases the rose_list_lock at each iteration. Additionally, the reference count to 's' is released before re-acquiring such lock. By the time rose_list_lock is re-acquired, some other process could have removed from the list both 's' and 'tmp' and even de-allocate them. Moving the 'sock_put(s);' after re-acquiring the rose_list_lock could protect from 's' being de-allocated, but can't protect from 'tmp' being deallocated, neither from 's' and 'tmp' being removed from the list. The above code is not safe. /P
Hello, On Wed, 13 Jul 2022 10:33:54 +0200 Paolo Abeni wrote: > > > On Mon, 2022-07-11 at 09:31 +0800, Duoming Zhou wrote: > > > > When the link layer connection is broken, the rose->neighbour is > > > > set to null. But rose->neighbour could be used by rose_connection() > > > > and rose_release() later, because there is no synchronization among > > > > them. As a result, the null-ptr-deref bugs will happen. > > > > > > > > One of the null-ptr-deref bugs is shown below: > > > > > > > > (thread 1) | (thread 2) > > > > | rose_connect > > > > rose_kill_by_neigh | lock_sock(sk) > > > > spin_lock_bh(&rose_list_lock) | if (!rose->neighbour) > > > > rose->neighbour = NULL;//(1) | > > > > | rose->neighbour->use++;//(2) > > > > > > > > The rose->neighbour is set to null in position (1) and dereferenced > > > > in position (2). > > > > > > > > The KASAN report triggered by POC is shown below: > > > > > > > > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > > > > ... > > > > RIP: 0010:rose_connect+0x6c2/0xf30 > > > > RSP: 0018:ffff88800ab47d60 EFLAGS: 00000206 > > > > RAX: 0000000000000005 RBX: 000000000000002a RCX: 0000000000000000 > > > > RDX: ffff88800ab38000 RSI: ffff88800ab47e48 RDI: ffff88800ab38309 > > > > RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1001567062 > > > > R10: dfffe91001567063 R11: 1ffff11001567061 R12: 1ffff11000d17cd0 > > > > R13: ffff8880068be680 R14: 0000000000000002 R15: 1ffff11000d17cd0 > > > > ... > > > > Call Trace: > > > > <TASK> > > > > ? __local_bh_enable_ip+0x54/0x80 > > > > ? selinux_netlbl_socket_connect+0x26/0x30 > > > > ? rose_bind+0x5b0/0x5b0 > > > > __sys_connect+0x216/0x280 > > > > __x64_sys_connect+0x71/0x80 > > > > do_syscall_64+0x43/0x90 > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > > > > This patch adds lock_sock() in rose_kill_by_neigh() in order to > > > > synchronize with rose_connect() and rose_release(). Then, changing > > > > type of 'neighbour->use' from unsigned short to atomic_t in order to > > > > mitigate race conditions caused by holding different socket lock while > > > > updating 'neighbour->use'. > > > > > > > > Meanwhile, this patch adds sock_hold() protected by rose_list_lock > > > > that could synchronize with rose_remove_socket() in order to mitigate > > > > UAF bug caused by lock_sock() we add. > > > > > > > > What's more, there is no need using rose_neigh_list_lock to protect > > > > rose_kill_by_neigh(). Because we have already used rose_neigh_list_lock > > > > to protect the state change of rose_neigh in rose_link_failed(), which > > > > is well synchronized. > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > > > --- > > > > Changes in v6: > > > > - Change sk_for_each() to sk_for_each_safe(). > > > > - Change type of 'neighbour->use' from unsigned short to atomic_t. > > > > > > > > include/net/rose.h | 2 +- > > > > net/rose/af_rose.c | 19 +++++++++++++------ > > > > net/rose/rose_in.c | 12 ++++++------ > > > > net/rose/rose_route.c | 24 ++++++++++++------------ > > > > net/rose/rose_timer.c | 2 +- > > > > 5 files changed, 33 insertions(+), 26 deletions(-) > > > > > > > > diff --git a/include/net/rose.h b/include/net/rose.h > > > > index 0f0a4ce0fee..d5ddebc556d 100644 > > > > --- a/include/net/rose.h > > > > +++ b/include/net/rose.h > > > > @@ -95,7 +95,7 @@ struct rose_neigh { > > > > ax25_cb *ax25; > > > > struct net_device *dev; > > > > unsigned short count; > > > > - unsigned short use; > > > > + atomic_t use; > > > > unsigned int number; > > > > char restarted; > > > > char dce_mode; > > > > diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c > > > > index bf2d986a6bc..54e7b76c4f3 100644 > > > > --- a/net/rose/af_rose.c > > > > +++ b/net/rose/af_rose.c > > > > @@ -163,16 +163,23 @@ static void rose_remove_socket(struct sock *sk) > > > > void rose_kill_by_neigh(struct rose_neigh *neigh) > > > > { > > > > struct sock *s; > > > > + struct hlist_node *tmp; > > > > > > > > spin_lock_bh(&rose_list_lock); > > > > - sk_for_each(s, &rose_list) { > > > > + sk_for_each_safe(s, tmp, &rose_list) { > > > > struct rose_sock *rose = rose_sk(s); > > > > > > > > + sock_hold(s); > > > > + spin_unlock_bh(&rose_list_lock); > > > > + lock_sock(s); > > > > if (rose->neighbour == neigh) { > > > > rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); > > > > - rose->neighbour->use--; > > > > + atomic_dec(&rose->neighbour->use); > > > > rose->neighbour = NULL; > > > > } > > > > + release_sock(s); > > > > + sock_put(s); > > > > > > I'm sorry, this does not work. At this point both 's' and 'tmp' sockets > > > can be freed and reused. Both iterators are not valid anymore when you > > > acquire the 'rose_list_lock' later. > > > > Thank you for your time and reply! But I think both 's' and 'tmp' can not > > be freed and reused in rose_kill_by_neigh(). Because rose_remove_socket() > > calls sk_del_node_init() which is protected by rose_list_lock to delete the > > socket node from the hlist and if sk->sk_refcnt equals to 1, the socket will > > be deallocated. > > > > static void rose_remove_socket(struct sock *sk) > > { > > spin_lock_bh(&rose_list_lock); > > sk_del_node_init(sk); > > spin_unlock_bh(&rose_list_lock); > > } > > > > https://elixir.bootlin.com/linux/v5.19-rc6/source/net/rose/af_rose.c#L152 > > > > Both 's' and 'tmp' in rose_kill_by_neigh() is also protected by rose_list_lock. > > The above loop explicitly releases the rose_list_lock at each > iteration. Additionally, the reference count to 's' is released before > re-acquiring such lock. By the time rose_list_lock is re-acquired, some > other process could have removed from the list both 's' and 'tmp' and > even de-allocate them. > > Moving the 'sock_put(s);' after re-acquiring the rose_list_lock could > protect from 's' being de-allocated, but can't protect from 'tmp' being > deallocated, neither from 's' and 'tmp' being removed from the list. > > The above code is not safe. I understand, I will improve the code , thank you! Best regards, Duoming Zhou
Hi, I am an oldtimer FPAC / ROSE user and occasionnally debugger. Let me take this opportunity to report a major issue present in rose module since kernel 5.4.83 (5.5.10). The bug is an impossibility for a rose application to connect to rose socket. Connect request was working until 5.4.81 kernel. Here is an illustration using Linux F6BVP-8 5.4.79-v7+ #1373 SMP Mon Nov 23 13:22:33 GMT 2020 armv7l GNU/Linux and kernel downgraded to kernel 4.4.79 on a RaspbBerry Pi configured with ROSE / FPAC node f6bvp-8. Connect request to co-located node on the same machine does not use Ethernet network. pi@F6BVP-8:~ $ sudo rose_call rose0 f6bvp f6bvp-8 2080175520 F6BVP-8 (Commands = ?) : uilt May 15 2022) for LINUX (help = h) Or success connecting a remote ROSE / FPAC node via Internet (AX25 over UDP frames) : pi@F6BVP-8:/etc/ax25 $ sudo rose_call rose0 f6bvp f6kkr-8 2080178520 F6KKR-8 (Commands = ?) : uilt Nov 17 2019) for LINUX (help = h) F6KKR-8 (Commands = ?) : On listen AX25 tool screen dump (pid=1(X.25) means ROSE protocol axudp: fm F6BVP-9 to F6KKR-9 ctl I11^ pid=1(X.25) len 60 15:25:04.162488 X.25: LCI 001 : CALL REQUEST - NbAlea: 7801 fm F6BVP-0 @2080,175520 to F6KKR-8 @2080,178520 axudp: fm F6KKR-9 to F6BVP-9 ctl I21^ pid=1(X.25) len 230 15:25:04.177346 X.25: LCI 001 : CALL ACCEPTED axudp: fm F6KKR-9 to F6BVP-9 ctl I22+ pid=1(X.25) len 179 15:25:04.182222 X.25: LCI 001 : DATA R0 S0 len 176 0000 55 73 65 72 20 63 61 6C 6C 20 3A 20 46 36 42 56 | User call : F6BV 0010 50 2D 30 0D 57 65 6C 63 6F 6D 65 2F 42 69 65 6E | P-0MWelcome/Bien 0020 76 65 6E 75 65 0D 46 36 4B 4B 52 20 52 61 6D 62 | venueMF6KKR Ramb 0030 6F 75 69 6C 6C 65 74 2C 20 37 38 20 2C 20 46 72 | ouillet, 78 , Fr 0040 61 6E 63 65 0D 35 30 6B 6D 20 53 57 20 6F 66 20 | anceM50km SW of 0050 50 61 72 69 73 0D 0D 46 50 41 43 2D 4E 6F 64 65 | ParisMMFPAC-Node 0060 20 76 20 34 2E 31 2E 31 2D 62 65 74 61 20 28 62 | v 4.1.1-beta (b 0070 75 69 6C 74 20 4E 6F 76 20 31 37 20 32 30 31 39 | uilt Nov 17 2019 0080 29 20 66 6F 72 20 4C 49 4E 55 58 20 28 68 65 6C | ) for LINUX (hel 0090 70 20 3D 20 68 29 0D 46 36 4B 4B 52 2D 38 20 28 | p = h)MF6KKR-8 ( 00A0 43 6F 6D 6D 61 6E 64 73 20 3D 20 3F 29 20 3A 20 | Commands = ?) : axudp: fm F6BVP-9 to F6KKR-9 ctl RR3- 15:25:04.184195 Using 5.18.11 kernel with up-to-date netdev ax25 and rose modules. Linux ubuntu-f6bvp 5.18.11-F6BVP #1 SMP PREEMPT_DYNAMIC Tue Jul 12 22:13:30 CEST 2022 x86_64 x86_64 x86_64 GNU/Linux And performing the same connection sequences. First connect request to co located node: bernard@ubuntu-f6bvp:/etc/ax25$ sudo rose_call rose0 f6bvp f6bvp-4 2080175524 Connecting to f6bvp-4 @ 2080175524 ... infinite wait ... And trying to connect a local network node does not show any packet going out when displaying ax25 activity with "listen" application : bernard@ubuntu-f6bvp:/etc/ax25$ sudo rose_call rose0 f6bvp f6bvp-8 2080175520 bernard@ubuntu-f6bvp:/etc/ax25$ 20 ... No connection... and no outgoing frames on listen screen dump AX25 application. Again: bernard@ubuntu-f6bvp:/etc/ax25$ sudo rose_call rose0 f6bvp f6kkr-8 2080178520 bernard@ubuntu-f6bvp:/etc/ax25$ 20 ... No connection. The issue seems to be in rose socket connect ... I understand that some ROSE headers have been changed ... recently (???) I would be pleased to check any patch to repair this nasty bug and be able to let 5.4.79 kernel away with its AX25 bugs ... Bernard Hemradio f6bvp / ai7bg http://f6bvp.org
On Thu, Jul 14, 2022 at 04:11:44PM +0200, Bernard f6bvp wrote: > Hi, > > I am an oldtimer FPAC / ROSE user and occasionnally debugger. > > Let me take this opportunity to report a major issue present in rose module > since kernel 5.4.83 (5.5.10). Do you think you going do a git bisect to find the buggy patch? https://wiki.gentoo.org/wiki/Kernel_git-bisect We should probably start a new email thread for that issue. regards, dan carpenter
On Fri, 15 Jul 2022 17:59:06 +0200 Bernard f6bvp wrote: > Here is the context. > > This patch adds dev_put(dev) in order to allow removal of rose module > after use of AX25 and ROSE via rose0 device. > > Otherwise when trying to remove rose module via rmmod rose an infinite > loop message was displayed on all consoles with xx being a random number. > > unregistered_netdevice: waiting for rose0 to become free. Usage count = xx > > unregistered_netdevice: waiting for rose0 to become free. Usage count = xx > > ... > > With the patch it is ok to rmmod rose. > > This bug appeared with kernel 4.10 and was tentatively repaired five > years ago. Please try resending with git send-email. Your current email contains HTML so it won't make it to netdev@ and other vger lists. > *Subject: [BUG] unregistered netdevice: wainting for rose0 to become > free. Usage count = xx <https://marc.info/?t=148811830800001&r=1&w=2> > From: f6bvp <f6bvp () free ! fr> > <https://marc.info/?a=128152583500001&r=1&w=2> Date: 2017-02-26 14:09:08 > <https://marc.info/?l=linux-hams&r=1&w=2&b=201702> Message-ID: > ce03a972-a3b0-ca24-5195-2fe2fd5c44d3 () free ! fr > <https://marc.info/?i=ce03a972-a3b0-ca24-5195-2fe2fd5c44d3%20()%20free%20!%20fr>* > > > Since then the bug reamains. Is it possible to use a link to the lore.kernel.org archive? It's the most common way of referring to past threads these days. > Signed-off-by: Bernard f6bvp / ai7bg Well formed s-o-b is required, "the name you'd use if you were signing a legal document". > diff --git a/a/net/rose/af_rose.c b/b/net/rose/af_rose.c > index bf2d986..41e106a 100644 > --- a/a/net/rose/af_rose.c > +++ b/b/net/rose/af_rose.c > @@ -711,6 +711,7 @@ static int rose_bind(struct socket *sock, struct > sockaddr *uaddr, int addr_len) > rose_insert_socket(sk); > > sock_reset_flag(sk, SOCK_ZAPPED); > + dev_put(dev); > > return 0; > }
Here is the context.
This patch adds dev_put(dev) in order to allow removal of rose module
after use of AX25 and ROSE via rose0 device.
Otherwise when trying to remove rose module via rmmod rose an infinite
loop message was displayed on all consoles with xx being a random number.
unregistered_netdevice: waiting for rose0 to become free. Usage count = xx
unregistered_netdevice: waiting for rose0 to become free. Usage count = xx
...
With the patch it is ok to rmmod rose.
This bug appeared with kernel 4.10 and has been only partially repaired
by adding two dev_put(dev).
Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
---
net/rose/af_rose.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index bf2d986a6bc3..4163171ce3a6 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -711,6 +711,8 @@ static int rose_bind(struct socket *sock, struct
sockaddr *uaddr, int addr_len)
rose_insert_socket(sk);
sock_reset_flag(sk, SOCK_ZAPPED);
+
+ dev_put(dev);
return 0;
}
On Fri, Jul 22, 2022 at 6:41 PM Bernard f6bvp <f6bvp@free.fr> wrote: > > Here is the context. > > This patch adds dev_put(dev) in order to allow removal of rose module > after use of AX25 and ROSE via rose0 device. > > Otherwise when trying to remove rose module via rmmod rose an infinite > loop message was displayed on all consoles with xx being a random number. > > unregistered_netdevice: waiting for rose0 to become free. Usage count = xx > > unregistered_netdevice: waiting for rose0 to become free. Usage count = xx > > ... > > With the patch it is ok to rmmod rose. > > This bug appeared with kernel 4.10 and has been only partially repaired > by adding two dev_put(dev). > > Signed-off-by: Bernard Pidoux <f6bvp@free.fr> > > --- > net/rose/af_rose.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c > index bf2d986a6bc3..4163171ce3a6 100644 > --- a/net/rose/af_rose.c > +++ b/net/rose/af_rose.c > @@ -711,6 +711,8 @@ static int rose_bind(struct socket *sock, struct > sockaddr *uaddr, int addr_len) > rose_insert_socket(sk); > > sock_reset_flag(sk, SOCK_ZAPPED); > + > + dev_put(dev); But, we have at line 698 : rose->device = dev; So we can not keep a pointer to a device without holding a reference on it. As a bonus we could convert these dev_put() to new infra added with CONFIG_NET_DEV_REFCNT_TRACKER=y > > return 0; > } > -- > 2.34.1 > > [master da21d19e920d] [PATCH] net: rose: fix unregistered netdevice: > waiting for rose0 to become free > Date: Mon Jul 18 16:23:54 2022 +0200 > 1 file changed, 2 insertions(+) > >
On Fri, 22 Jul 2022 18:41:28 +0200 Bernard f6bvp wrote: > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 > Thunderbird/91.11.0 Still whitespace damaged, can you try git send-email ?
I modified .config according to CONFIG_NET_DEV_REFCNT_TRACKER=y then compiled moduled and ran my usual AX25 and ROSE applications. Attached is (I hope) relevant dmesg dump. [ 0.000000] microcode: microcode updated early to revision 0x26, date = 2019-11-12 [ 0.000000] Linux version 5.18.11-F6BVP (root@ubuntu-f6bvp) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #3 SMP PREEMPT_DYNAMIC Sat Jul 23 01:14:23 CEST 2022 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.18.11-F6BVP root=UUID=3ba9ef9f-79fe-49dd-b301-9a509571f7a6 ro quiet splash vt.handoff=7 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Hygon HygonGenuine [ 0.000000] Centaur CentaurHauls [ 0.000000] zhaoxin Shanghai ... [ 48.485509] NET: Registered PF_AX25 protocol family [ 48.520677] NET: Registered PF_ROSE protocol family [ 56.854635] NET: Unregistered PF_ROSE protocol family [ 59.235217] NET: Unregistered PF_AX25 protocol family [ 69.314892] NET: Registered PF_AX25 protocol family [ 69.320617] mkiss: AX.25 Multikiss, Hans Albas PE1AYX [ 69.321340] mkiss: ax0: crc mode is auto. [ 69.321481] IPv6: ADDRCONF(NETDEV_CHANGE): ax0: link becomes ready [ 71.363304] NET: Registered PF_NETROM protocol family [ 73.477000] NET: Registered PF_ROSE protocol family [ 79.487926] mkiss: ax0: Trying crc-smack [ 79.488053] mkiss: ax0: Trying crc-flexnet [ 205.798723] reference already released. [ 205.798732] allocated in: [ 205.798734] ax25_bind+0x1a2/0x230 [ax25] [ 205.798747] __sys_bind+0xea/0x110 [ 205.798753] __x64_sys_bind+0x18/0x20 [ 205.798758] do_syscall_64+0x5c/0x80 [ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798768] freed in: [ 205.798770] ax25_release+0x115/0x370 [ax25] [ 205.798778] __sock_release+0x42/0xb0 [ 205.798782] sock_close+0x15/0x20 [ 205.798785] __fput+0x9f/0x260 [ 205.798789] ____fput+0xe/0x10 [ 205.798792] task_work_run+0x64/0xa0 [ 205.798798] exit_to_user_mode_prepare+0x18b/0x190 [ 205.798804] syscall_exit_to_user_mode+0x26/0x40 [ 205.798808] do_syscall_64+0x69/0x80 [ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798827] ------------[ cut here ]------------ [ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81 [ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3 [ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81 [ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e [ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286 [ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000 [ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff [ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618 [ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0 [ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001 [ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0 [ 205.799033] Call Trace: [ 205.799035] <TASK> [ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25] [ 205.799055] ? raw_notifier_call_chain+0x49/0x60 [ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0 [ 205.799065] ? dev_close_many+0xc8/0x120 [ 205.799070] ? unregister_netdevice_many+0x13d/0x890 [ 205.799073] ? unregister_netdevice_queue+0x90/0xe0 [ 205.799076] ? unregister_netdev+0x1d/0x30 [ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss] [ 205.799084] ? tty_ldisc_close+0x2e/0x40 [ 205.799089] ? tty_ldisc_hangup+0x137/0x210 [ 205.799092] ? __tty_hangup.part.0+0x208/0x350 [ 205.799098] ? tty_vhangup+0x15/0x20 [ 205.799103] ? pty_close+0x127/0x160 [ 205.799108] ? tty_release+0x139/0x5e0 [ 205.799112] ? __fput+0x9f/0x260 [ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799126] ax25_device_event+0x9f/0x270 [ax25] [ 205.799135] raw_notifier_call_chain+0x49/0x60 [ 205.799140] call_netdevice_notifiers_info+0x52/0xa0 [ 205.799146] dev_close_many+0xc8/0x120 [ 205.799152] unregister_netdevice_many+0x13d/0x890 [ 205.799157] unregister_netdevice_queue+0x90/0xe0 [ 205.799161] unregister_netdev+0x1d/0x30 [ 205.799165] mkiss_close+0x7c/0xc0 [mkiss] [ 205.799170] tty_ldisc_close+0x2e/0x40 [ 205.799173] tty_ldisc_hangup+0x137/0x210 [ 205.799178] __tty_hangup.part.0+0x208/0x350 [ 205.799184] tty_vhangup+0x15/0x20 [ 205.799188] pty_close+0x127/0x160 [ 205.799193] tty_release+0x139/0x5e0 [ 205.799199] __fput+0x9f/0x260 [ 205.799203] ____fput+0xe/0x10 [ 205.799208] task_work_run+0x64/0xa0 [ 205.799213] do_exit+0x33b/0xab0 [ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.799224] do_group_exit+0x35/0xa0 [ 205.799228] __x64_sys_exit_group+0x18/0x20 [ 205.799232] do_syscall_64+0x5c/0x80 [ 205.799238] ? handle_mm_fault+0xba/0x290 [ 205.799242] ? debug_smp_processor_id+0x17/0x20 [ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.799260] ? irqentry_exit+0x33/0x40 [ 205.799263] ? exc_page_fault+0x87/0x170 [ 205.799268] ? asm_exc_page_fault+0x8/0x30 [ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.799277] RIP: 0033:0x7ff6b80eaca1 [ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.799304] </TASK> [ 205.799306] ---[ end trace 0000000000000000 ]--- [ 205.823488] leaked reference. [ 205.823500] ax25_dev_device_up+0x6b/0x160 [ax25] [ 205.823514] ax25_device_event+0x1c6/0x270 [ax25] [ 205.823522] raw_notifier_call_chain+0x49/0x60 [ 205.823529] call_netdevice_notifiers_info+0x52/0xa0 [ 205.823535] __dev_notify_flags+0x58/0xe0 [ 205.823538] dev_change_flags+0x51/0x60 [ 205.823542] devinet_ioctl+0x614/0x810 [ 205.823546] inet_ioctl+0x165/0x190 [ 205.823548] sock_do_ioctl+0x45/0x100 [ 205.823552] sock_ioctl+0xef/0x310 [ 205.823555] __x64_sys_ioctl+0x91/0xc0 [ 205.823559] do_syscall_64+0x5c/0x80 [ 205.823565] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.823586] ------------[ cut here ]------------ [ 205.823589] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:39 ref_tracker_dir_exit.cold+0x66/0x72 [ 205.823599] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.823697] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.823732] CPU: 2 PID: 2605 Comm: ax25ipd Tainted: G W 5.18.11-F6BVP #3 [ 205.823736] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.823738] RIP: 0010:ref_tracker_dir_exit.cold+0x66/0x72 [ 205.823744] Code: 00 00 00 ad de 49 89 44 24 08 4d 89 2c 24 49 89 dc e8 7d d8 67 ff 48 8b 03 4c 39 fb 75 13 48 8b 75 d0 4c 89 f7 e8 c9 fd 07 00 <0f> 0b e9 5f 04 9b ff 48 89 c3 eb 98 41 0f b6 f5 48 c7 c7 c0 fa 4e [ 205.823747] RSP: 0018:ffffaf5281073ba0 EFLAGS: 00010286 [ 205.823750] RAX: 0000000080000000 RBX: ffff9a0bc53384e8 RCX: 0000000000000000 [ 205.823753] RDX: 0000000000000001 RSI: 0000000000000286 RDI: 00000000ffffffff [ 205.823755] RBP: ffffaf5281073bd0 R08: ffff9a0be6c54a00 R09: 000000000080005f [ 205.823757] R10: 0000000000ffff10 R11: 0000000000000000 R12: ffff9a0bc53384e8 [ 205.823759] R13: dead000000000100 R14: ffff9a0bc53384d0 R15: ffff9a0bc53384e8 [ 205.823762] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.823764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.823766] CR2: 00007fc64c0945f8 CR3: 000000001ac10002 CR4: 00000000001706e0 [ 205.823769] Call Trace: [ 205.823771] <TASK> [ 205.823775] free_netdev+0xf0/0x1d0 [ 205.823783] mkiss_close+0x9e/0xc0 [mkiss] [ 205.823788] tty_ldisc_close+0x2e/0x40 [ 205.823792] tty_ldisc_hangup+0x137/0x210 [ 205.823795] __tty_hangup.part.0+0x208/0x350 [ 205.823802] tty_vhangup+0x15/0x20 [ 205.823807] pty_close+0x127/0x160 [ 205.823811] tty_release+0x139/0x5e0 [ 205.823817] __fput+0x9f/0x260 [ 205.823822] ____fput+0xe/0x10 [ 205.823826] task_work_run+0x64/0xa0 [ 205.823832] do_exit+0x33b/0xab0 [ 205.823836] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.823843] do_group_exit+0x35/0xa0 [ 205.823847] __x64_sys_exit_group+0x18/0x20 [ 205.823850] do_syscall_64+0x5c/0x80 [ 205.823855] ? handle_mm_fault+0xba/0x290 [ 205.823859] ? debug_smp_processor_id+0x17/0x20 [ 205.823862] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.823866] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.823873] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.823876] ? irqentry_exit+0x33/0x40 [ 205.823878] ? exc_page_fault+0x87/0x170 [ 205.823884] ? asm_exc_page_fault+0x8/0x30 [ 205.823888] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.823891] RIP: 0033:0x7ff6b80eaca1 [ 205.823894] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.823896] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.823899] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.823901] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.823903] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.823905] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.823907] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.823912] </TASK> [ 205.823913] ---[ end trace 0000000000000000 ]--- [ 213.911503] NET: Unregistered PF_NETROM protocol family [ 216.051493] NET: Unregistered PF_ROSE protocol family [ 218.435464] NET: Unregistered PF_AX25 protocol family [ 254.591206] NET: Registered PF_AX25 protocol family [ 254.616523] NET: Registered PF_ROSE protocol family [ 262.939365] NET: Unregistered PF_ROSE protocol family [ 265.323355] NET: Unregistered PF_AX25 protocol family [ 275.399794] NET: Registered PF_AX25 protocol family [ 275.403834] mkiss: AX.25 Multikiss, Hans Albas PE1AYX [ 275.404600] mkiss: ax0: crc mode is auto. [ 275.404757] IPv6: ADDRCONF(NETDEV_CHANGE): ax0: link becomes ready [ 277.439314] NET: Registered PF_NETROM protocol family [ 279.545448] NET: Registered PF_ROSE protocol family [ 285.555364] mkiss: ax0: Trying crc-smack [ 285.555485] mkiss: ax0: Trying crc-flexnet [17750.514430] kauditd_printk_skb: 16 callbacks suppressed
I modified .config according to CONFIG_NET_DEV_REFCNT_TRACKER=y then compiled moduled and ran my usual AX25 and ROSE applications. Attached is (I hope) relevant dmesg dump. [ 0.000000] microcode: microcode updated early to revision 0x26, date = 2019-11-12 [ 0.000000] Linux version 5.18.11-F6BVP (root@ubuntu-f6bvp) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #3 SMP PREEMPT_DYNAMIC Sat Jul 23 01:14:23 CEST 2022 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.18.11-F6BVP root=UUID=3ba9ef9f-79fe-49dd-b301-9a509571f7a6 ro quiet splash vt.handoff=7 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Hygon HygonGenuine [ 0.000000] Centaur CentaurHauls [ 0.000000] zhaoxin Shanghai ... [ 48.485509] NET: Registered PF_AX25 protocol family [ 48.520677] NET: Registered PF_ROSE protocol family [ 56.854635] NET: Unregistered PF_ROSE protocol family [ 59.235217] NET: Unregistered PF_AX25 protocol family [ 69.314892] NET: Registered PF_AX25 protocol family [ 69.320617] mkiss: AX.25 Multikiss, Hans Albas PE1AYX [ 69.321340] mkiss: ax0: crc mode is auto. [ 69.321481] IPv6: ADDRCONF(NETDEV_CHANGE): ax0: link becomes ready [ 71.363304] NET: Registered PF_NETROM protocol family [ 73.477000] NET: Registered PF_ROSE protocol family [ 79.487926] mkiss: ax0: Trying crc-smack [ 79.488053] mkiss: ax0: Trying crc-flexnet [ 205.798723] reference already released. [ 205.798732] allocated in: [ 205.798734] ax25_bind+0x1a2/0x230 [ax25] [ 205.798747] __sys_bind+0xea/0x110 [ 205.798753] __x64_sys_bind+0x18/0x20 [ 205.798758] do_syscall_64+0x5c/0x80 [ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798768] freed in: [ 205.798770] ax25_release+0x115/0x370 [ax25] [ 205.798778] __sock_release+0x42/0xb0 [ 205.798782] sock_close+0x15/0x20 [ 205.798785] __fput+0x9f/0x260 [ 205.798789] ____fput+0xe/0x10 [ 205.798792] task_work_run+0x64/0xa0 [ 205.798798] exit_to_user_mode_prepare+0x18b/0x190 [ 205.798804] syscall_exit_to_user_mode+0x26/0x40 [ 205.798808] do_syscall_64+0x69/0x80 [ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798827] ------------[ cut here ]------------ [ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81 [ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3 [ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81 [ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e [ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286 [ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000 [ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff [ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618 [ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0 [ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001 [ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0 [ 205.799033] Call Trace: [ 205.799035] <TASK> [ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25] [ 205.799055] ? raw_notifier_call_chain+0x49/0x60 [ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0 [ 205.799065] ? dev_close_many+0xc8/0x120 [ 205.799070] ? unregister_netdevice_many+0x13d/0x890 [ 205.799073] ? unregister_netdevice_queue+0x90/0xe0 [ 205.799076] ? unregister_netdev+0x1d/0x30 [ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss] [ 205.799084] ? tty_ldisc_close+0x2e/0x40 [ 205.799089] ? tty_ldisc_hangup+0x137/0x210 [ 205.799092] ? __tty_hangup.part.0+0x208/0x350 [ 205.799098] ? tty_vhangup+0x15/0x20 [ 205.799103] ? pty_close+0x127/0x160 [ 205.799108] ? tty_release+0x139/0x5e0 [ 205.799112] ? __fput+0x9f/0x260 [ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799126] ax25_device_event+0x9f/0x270 [ax25] [ 205.799135] raw_notifier_call_chain+0x49/0x60 [ 205.799140] call_netdevice_notifiers_info+0x52/0xa0 [ 205.799146] dev_close_many+0xc8/0x120 [ 205.799152] unregister_netdevice_many+0x13d/0x890 [ 205.799157] unregister_netdevice_queue+0x90/0xe0 [ 205.799161] unregister_netdev+0x1d/0x30 [ 205.799165] mkiss_close+0x7c/0xc0 [mkiss] [ 205.799170] tty_ldisc_close+0x2e/0x40 [ 205.799173] tty_ldisc_hangup+0x137/0x210 [ 205.799178] __tty_hangup.part.0+0x208/0x350 [ 205.799184] tty_vhangup+0x15/0x20 [ 205.799188] pty_close+0x127/0x160 [ 205.799193] tty_release+0x139/0x5e0 [ 205.799199] __fput+0x9f/0x260 [ 205.799203] ____fput+0xe/0x10 [ 205.799208] task_work_run+0x64/0xa0 [ 205.799213] do_exit+0x33b/0xab0 [ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.799224] do_group_exit+0x35/0xa0 [ 205.799228] __x64_sys_exit_group+0x18/0x20 [ 205.799232] do_syscall_64+0x5c/0x80 [ 205.799238] ? handle_mm_fault+0xba/0x290 [ 205.799242] ? debug_smp_processor_id+0x17/0x20 [ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.799260] ? irqentry_exit+0x33/0x40 [ 205.799263] ? exc_page_fault+0x87/0x170 [ 205.799268] ? asm_exc_page_fault+0x8/0x30 [ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.799277] RIP: 0033:0x7ff6b80eaca1 [ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.799304] </TASK> [ 205.799306] ---[ end trace 0000000000000000 ]--- [ 205.823488] leaked reference. [ 205.823500] ax25_dev_device_up+0x6b/0x160 [ax25] [ 205.823514] ax25_device_event+0x1c6/0x270 [ax25] [ 205.823522] raw_notifier_call_chain+0x49/0x60 [ 205.823529] call_netdevice_notifiers_info+0x52/0xa0 [ 205.823535] __dev_notify_flags+0x58/0xe0 [ 205.823538] dev_change_flags+0x51/0x60 [ 205.823542] devinet_ioctl+0x614/0x810 [ 205.823546] inet_ioctl+0x165/0x190 [ 205.823548] sock_do_ioctl+0x45/0x100 [ 205.823552] sock_ioctl+0xef/0x310 [ 205.823555] __x64_sys_ioctl+0x91/0xc0 [ 205.823559] do_syscall_64+0x5c/0x80 [ 205.823565] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.823586] ------------[ cut here ]------------ [ 205.823589] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:39 ref_tracker_dir_exit.cold+0x66/0x72 [ 205.823599] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.823697] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.823732] CPU: 2 PID: 2605 Comm: ax25ipd Tainted: G W 5.18.11-F6BVP #3 [ 205.823736] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.823738] RIP: 0010:ref_tracker_dir_exit.cold+0x66/0x72 [ 205.823744] Code: 00 00 00 ad de 49 89 44 24 08 4d 89 2c 24 49 89 dc e8 7d d8 67 ff 48 8b 03 4c 39 fb 75 13 48 8b 75 d0 4c 89 f7 e8 c9 fd 07 00 <0f> 0b e9 5f 04 9b ff 48 89 c3 eb 98 41 0f b6 f5 48 c7 c7 c0 fa 4e [ 205.823747] RSP: 0018:ffffaf5281073ba0 EFLAGS: 00010286 [ 205.823750] RAX: 0000000080000000 RBX: ffff9a0bc53384e8 RCX: 0000000000000000 [ 205.823753] RDX: 0000000000000001 RSI: 0000000000000286 RDI: 00000000ffffffff [ 205.823755] RBP: ffffaf5281073bd0 R08: ffff9a0be6c54a00 R09: 000000000080005f [ 205.823757] R10: 0000000000ffff10 R11: 0000000000000000 R12: ffff9a0bc53384e8 [ 205.823759] R13: dead000000000100 R14: ffff9a0bc53384d0 R15: ffff9a0bc53384e8 [ 205.823762] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.823764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.823766] CR2: 00007fc64c0945f8 CR3: 000000001ac10002 CR4: 00000000001706e0 [ 205.823769] Call Trace: [ 205.823771] <TASK> [ 205.823775] free_netdev+0xf0/0x1d0 [ 205.823783] mkiss_close+0x9e/0xc0 [mkiss] [ 205.823788] tty_ldisc_close+0x2e/0x40 [ 205.823792] tty_ldisc_hangup+0x137/0x210 [ 205.823795] __tty_hangup.part.0+0x208/0x350 [ 205.823802] tty_vhangup+0x15/0x20 [ 205.823807] pty_close+0x127/0x160 [ 205.823811] tty_release+0x139/0x5e0 [ 205.823817] __fput+0x9f/0x260 [ 205.823822] ____fput+0xe/0x10 [ 205.823826] task_work_run+0x64/0xa0 [ 205.823832] do_exit+0x33b/0xab0 [ 205.823836] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.823843] do_group_exit+0x35/0xa0 [ 205.823847] __x64_sys_exit_group+0x18/0x20 [ 205.823850] do_syscall_64+0x5c/0x80 [ 205.823855] ? handle_mm_fault+0xba/0x290 [ 205.823859] ? debug_smp_processor_id+0x17/0x20 [ 205.823862] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.823866] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.823873] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.823876] ? irqentry_exit+0x33/0x40 [ 205.823878] ? exc_page_fault+0x87/0x170 [ 205.823884] ? asm_exc_page_fault+0x8/0x30 [ 205.823888] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.823891] RIP: 0033:0x7ff6b80eaca1 [ 205.823894] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.823896] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.823899] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.823901] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.823903] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.823905] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.823907] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.823912] </TASK> [ 205.823913] ---[ end trace 0000000000000000 ]--- [ 213.911503] NET: Unregistered PF_NETROM protocol family [ 216.051493] NET: Unregistered PF_ROSE protocol family [ 218.435464] NET: Unregistered PF_AX25 protocol family [ 254.591206] NET: Registered PF_AX25 protocol family [ 254.616523] NET: Registered PF_ROSE protocol family [ 262.939365] NET: Unregistered PF_ROSE protocol family [ 265.323355] NET: Unregistered PF_AX25 protocol family [ 275.399794] NET: Registered PF_AX25 protocol family [ 275.403834] mkiss: AX.25 Multikiss, Hans Albas PE1AYX [ 275.404600] mkiss: ax0: crc mode is auto. [ 275.404757] IPv6: ADDRCONF(NETDEV_CHANGE): ax0: link becomes ready [ 277.439314] NET: Registered PF_NETROM protocol family [ 279.545448] NET: Registered PF_ROSE protocol family [ 285.555364] mkiss: ax0: Trying crc-smack [ 285.555485] mkiss: ax0: Trying crc-flexnet [17750.514430] kauditd_printk_skb: 16 callbacks suppressed
On Sat, Jul 23, 2022 at 1:21 PM Bernard F6BVP <bernard.f6bvp@gmail.com> wrote: > > > I modified .config according to > CONFIG_NET_DEV_REFCNT_TRACKER=y > then compiled moduled and ran my usual AX25 and ROSE applications. > > Attached is (I hope) relevant dmesg dump. Thanks ! There are a lot of problems really... FIrst one being in ax25: [ 205.798723] reference already released. [ 205.798732] allocated in: [ 205.798734] ax25_bind+0x1a2/0x230 [ax25] [ 205.798747] __sys_bind+0xea/0x110 [ 205.798753] __x64_sys_bind+0x18/0x20 [ 205.798758] do_syscall_64+0x5c/0x80 [ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798768] freed in: [ 205.798770] ax25_release+0x115/0x370 [ax25] [ 205.798778] __sock_release+0x42/0xb0 [ 205.798782] sock_close+0x15/0x20 [ 205.798785] __fput+0x9f/0x260 [ 205.798789] ____fput+0xe/0x10 [ 205.798792] task_work_run+0x64/0xa0 [ 205.798798] exit_to_user_mode_prepare+0x18b/0x190 [ 205.798804] syscall_exit_to_user_mode+0x26/0x40 [ 205.798808] do_syscall_64+0x69/0x80 [ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798827] ------------[ cut here ]------------ [ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81 [ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3 [ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81 [ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e [ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286 [ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000 [ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff [ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618 [ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0 [ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001 [ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0 [ 205.799033] Call Trace: [ 205.799035] <TASK> [ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25] [ 205.799055] ? raw_notifier_call_chain+0x49/0x60 [ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0 [ 205.799065] ? dev_close_many+0xc8/0x120 [ 205.799070] ? unregister_netdevice_many+0x13d/0x890 [ 205.799073] ? unregister_netdevice_queue+0x90/0xe0 [ 205.799076] ? unregister_netdev+0x1d/0x30 [ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss] [ 205.799084] ? tty_ldisc_close+0x2e/0x40 [ 205.799089] ? tty_ldisc_hangup+0x137/0x210 [ 205.799092] ? __tty_hangup.part.0+0x208/0x350 [ 205.799098] ? tty_vhangup+0x15/0x20 [ 205.799103] ? pty_close+0x127/0x160 [ 205.799108] ? tty_release+0x139/0x5e0 [ 205.799112] ? __fput+0x9f/0x260 [ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799126] ax25_device_event+0x9f/0x270 [ax25] [ 205.799135] raw_notifier_call_chain+0x49/0x60 [ 205.799140] call_netdevice_notifiers_info+0x52/0xa0 [ 205.799146] dev_close_many+0xc8/0x120 [ 205.799152] unregister_netdevice_many+0x13d/0x890 [ 205.799157] unregister_netdevice_queue+0x90/0xe0 [ 205.799161] unregister_netdev+0x1d/0x30 [ 205.799165] mkiss_close+0x7c/0xc0 [mkiss] [ 205.799170] tty_ldisc_close+0x2e/0x40 [ 205.799173] tty_ldisc_hangup+0x137/0x210 [ 205.799178] __tty_hangup.part.0+0x208/0x350 [ 205.799184] tty_vhangup+0x15/0x20 [ 205.799188] pty_close+0x127/0x160 [ 205.799193] tty_release+0x139/0x5e0 [ 205.799199] __fput+0x9f/0x260 [ 205.799203] ____fput+0xe/0x10 [ 205.799208] task_work_run+0x64/0xa0 [ 205.799213] do_exit+0x33b/0xab0 [ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.799224] do_group_exit+0x35/0xa0 [ 205.799228] __x64_sys_exit_group+0x18/0x20 [ 205.799232] do_syscall_64+0x5c/0x80 [ 205.799238] ? handle_mm_fault+0xba/0x290 [ 205.799242] ? debug_smp_processor_id+0x17/0x20 [ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.799260] ? irqentry_exit+0x33/0x40 [ 205.799263] ? exc_page_fault+0x87/0x170 [ 205.799268] ? asm_exc_page_fault+0x8/0x30 [ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.799277] RIP: 0033:0x7ff6b80eaca1 [ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.799304] </TASK>
diff --git a/include/net/rose.h b/include/net/rose.h index 0f0a4ce0fee..d5ddebc556d 100644 --- a/include/net/rose.h +++ b/include/net/rose.h @@ -95,7 +95,7 @@ struct rose_neigh { ax25_cb *ax25; struct net_device *dev; unsigned short count; - unsigned short use; + atomic_t use; unsigned int number; char restarted; char dce_mode; diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index bf2d986a6bc..54e7b76c4f3 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -163,16 +163,23 @@ static void rose_remove_socket(struct sock *sk) void rose_kill_by_neigh(struct rose_neigh *neigh) { struct sock *s; + struct hlist_node *tmp; spin_lock_bh(&rose_list_lock); - sk_for_each(s, &rose_list) { + sk_for_each_safe(s, tmp, &rose_list) { struct rose_sock *rose = rose_sk(s); + sock_hold(s); + spin_unlock_bh(&rose_list_lock); + lock_sock(s); if (rose->neighbour == neigh) { rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); rose->neighbour = NULL; } + release_sock(s); + sock_put(s); + spin_lock_bh(&rose_list_lock); } spin_unlock_bh(&rose_list_lock); } @@ -191,7 +198,7 @@ static void rose_kill_by_device(struct net_device *dev) if (rose->device == dev) { rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); if (rose->neighbour) - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); rose->device = NULL; } } @@ -618,7 +625,7 @@ static int rose_release(struct socket *sock) break; case ROSE_STATE_2: - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); release_sock(sk); rose_disconnect(sk, 0, -1, -1); lock_sock(sk); @@ -819,7 +826,7 @@ static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_le rose->state = ROSE_STATE_1; - rose->neighbour->use++; + atomic_inc(&rose->neighbour->use); rose_write_internal(sk, ROSE_CALL_REQUEST); rose_start_heartbeat(sk); @@ -1019,7 +1026,7 @@ int rose_rx_call_request(struct sk_buff *skb, struct net_device *dev, struct ros make_rose->device = dev; make_rose->facilities = facilities; - make_rose->neighbour->use++; + atomic_inc(&make_rose->neighbour->use); if (rose_sk(sk)->defer) { make_rose->state = ROSE_STATE_5; diff --git a/net/rose/rose_in.c b/net/rose/rose_in.c index 4d67f36dce1..86168f29943 100644 --- a/net/rose/rose_in.c +++ b/net/rose/rose_in.c @@ -56,7 +56,7 @@ static int rose_state1_machine(struct sock *sk, struct sk_buff *skb, int framety case ROSE_CLEAR_REQUEST: rose_write_internal(sk, ROSE_CLEAR_CONFIRMATION); rose_disconnect(sk, ECONNREFUSED, skb->data[3], skb->data[4]); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); break; default: @@ -79,12 +79,12 @@ static int rose_state2_machine(struct sock *sk, struct sk_buff *skb, int framety case ROSE_CLEAR_REQUEST: rose_write_internal(sk, ROSE_CLEAR_CONFIRMATION); rose_disconnect(sk, 0, skb->data[3], skb->data[4]); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); break; case ROSE_CLEAR_CONFIRMATION: rose_disconnect(sk, 0, -1, -1); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); break; default: @@ -120,7 +120,7 @@ static int rose_state3_machine(struct sock *sk, struct sk_buff *skb, int framety case ROSE_CLEAR_REQUEST: rose_write_internal(sk, ROSE_CLEAR_CONFIRMATION); rose_disconnect(sk, 0, skb->data[3], skb->data[4]); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); break; case ROSE_RR: @@ -233,7 +233,7 @@ static int rose_state4_machine(struct sock *sk, struct sk_buff *skb, int framety case ROSE_CLEAR_REQUEST: rose_write_internal(sk, ROSE_CLEAR_CONFIRMATION); rose_disconnect(sk, 0, skb->data[3], skb->data[4]); - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); break; default: @@ -253,7 +253,7 @@ static int rose_state5_machine(struct sock *sk, struct sk_buff *skb, int framety if (frametype == ROSE_CLEAR_REQUEST) { rose_write_internal(sk, ROSE_CLEAR_CONFIRMATION); rose_disconnect(sk, 0, skb->data[3], skb->data[4]); - rose_sk(sk)->neighbour->use--; + atomic_dec(&rose_sk(sk)->neighbour->use); } return 0; diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c index eb0b8197ac8..8be00a44540 100644 --- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -93,7 +93,7 @@ static int __must_check rose_add_node(struct rose_route_struct *rose_route, rose_neigh->ax25 = NULL; rose_neigh->dev = dev; rose_neigh->count = 0; - rose_neigh->use = 0; + atomic_set(&rose_neigh->use, 0); rose_neigh->dce_mode = 0; rose_neigh->loopback = 0; rose_neigh->number = rose_neigh_no++; @@ -263,10 +263,10 @@ static void rose_remove_route(struct rose_route *rose_route) struct rose_route *s; if (rose_route->neigh1 != NULL) - rose_route->neigh1->use--; + atomic_dec(&rose_route->neigh1->use); if (rose_route->neigh2 != NULL) - rose_route->neigh2->use--; + atomic_dec(&rose_route->neigh2->use); if ((s = rose_route_list) == rose_route) { rose_route_list = rose_route->next; @@ -331,7 +331,7 @@ static int rose_del_node(struct rose_route_struct *rose_route, if (rose_node->neighbour[i] == rose_neigh) { rose_neigh->count--; - if (rose_neigh->count == 0 && rose_neigh->use == 0) + if (rose_neigh->count == 0 && atomic_read(&rose_neigh->use) == 0) rose_remove_neigh(rose_neigh); rose_node->count--; @@ -381,7 +381,7 @@ void rose_add_loopback_neigh(void) sn->ax25 = NULL; sn->dev = NULL; sn->count = 0; - sn->use = 0; + atomic_set(&sn->use, 0); sn->dce_mode = 1; sn->loopback = 1; sn->number = rose_neigh_no++; @@ -573,7 +573,7 @@ static int rose_clear_routes(void) s = rose_neigh; rose_neigh = rose_neigh->next; - if (s->use == 0 && !s->loopback) { + if (atomic_read(&s->use) == 0 && !s->loopback) { s->count = 0; rose_remove_neigh(s); } @@ -789,13 +789,13 @@ static void rose_del_route_by_neigh(struct rose_neigh *rose_neigh) } if (rose_route->neigh1 == rose_neigh) { - rose_route->neigh1->use--; + atomic_dec(&rose_route->neigh1->use); rose_route->neigh1 = NULL; rose_transmit_clear_request(rose_route->neigh2, rose_route->lci2, ROSE_OUT_OF_ORDER, 0); } if (rose_route->neigh2 == rose_neigh) { - rose_route->neigh2->use--; + atomic_dec(&rose_route->neigh2->use); rose_route->neigh2 = NULL; rose_transmit_clear_request(rose_route->neigh1, rose_route->lci1, ROSE_OUT_OF_ORDER, 0); } @@ -924,7 +924,7 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25) rose_clear_queues(sk); rose->cause = ROSE_NETWORK_CONGESTION; rose->diagnostic = 0; - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); rose->neighbour = NULL; rose->lci = 0; rose->state = ROSE_STATE_0; @@ -1067,8 +1067,8 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25) rose_route->lci2 = new_lci; rose_route->neigh2 = new_neigh; - rose_route->neigh1->use++; - rose_route->neigh2->use++; + atomic_inc(&rose_route->neigh1->use); + atomic_inc(&rose_route->neigh2->use); rose_route->next = rose_route_list; rose_route_list = rose_route; @@ -1195,7 +1195,7 @@ static int rose_neigh_show(struct seq_file *seq, void *v) (rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign), rose_neigh->dev ? rose_neigh->dev->name : "???", rose_neigh->count, - rose_neigh->use, + atomic_read(&rose_neigh->use), (rose_neigh->dce_mode) ? "DCE" : "DTE", (rose_neigh->restarted) ? "yes" : "no", ax25_display_timer(&rose_neigh->t0timer) / HZ, diff --git a/net/rose/rose_timer.c b/net/rose/rose_timer.c index f06ddbed3fe..9dfd4eae5d5 100644 --- a/net/rose/rose_timer.c +++ b/net/rose/rose_timer.c @@ -171,7 +171,7 @@ static void rose_timer_expiry(struct timer_list *t) break; case ROSE_STATE_2: /* T3 */ - rose->neighbour->use--; + atomic_dec(&rose->neighbour->use); rose_disconnect(sk, ETIMEDOUT, -1, -1); break;
When the link layer connection is broken, the rose->neighbour is set to null. But rose->neighbour could be used by rose_connection() and rose_release() later, because there is no synchronization among them. As a result, the null-ptr-deref bugs will happen. One of the null-ptr-deref bugs is shown below: (thread 1) | (thread 2) | rose_connect rose_kill_by_neigh | lock_sock(sk) spin_lock_bh(&rose_list_lock) | if (!rose->neighbour) rose->neighbour = NULL;//(1) | | rose->neighbour->use++;//(2) The rose->neighbour is set to null in position (1) and dereferenced in position (2). The KASAN report triggered by POC is shown below: KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] ... RIP: 0010:rose_connect+0x6c2/0xf30 RSP: 0018:ffff88800ab47d60 EFLAGS: 00000206 RAX: 0000000000000005 RBX: 000000000000002a RCX: 0000000000000000 RDX: ffff88800ab38000 RSI: ffff88800ab47e48 RDI: ffff88800ab38309 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1001567062 R10: dfffe91001567063 R11: 1ffff11001567061 R12: 1ffff11000d17cd0 R13: ffff8880068be680 R14: 0000000000000002 R15: 1ffff11000d17cd0 ... Call Trace: <TASK> ? __local_bh_enable_ip+0x54/0x80 ? selinux_netlbl_socket_connect+0x26/0x30 ? rose_bind+0x5b0/0x5b0 __sys_connect+0x216/0x280 __x64_sys_connect+0x71/0x80 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This patch adds lock_sock() in rose_kill_by_neigh() in order to synchronize with rose_connect() and rose_release(). Then, changing type of 'neighbour->use' from unsigned short to atomic_t in order to mitigate race conditions caused by holding different socket lock while updating 'neighbour->use'. Meanwhile, this patch adds sock_hold() protected by rose_list_lock that could synchronize with rose_remove_socket() in order to mitigate UAF bug caused by lock_sock() we add. What's more, there is no need using rose_neigh_list_lock to protect rose_kill_by_neigh(). Because we have already used rose_neigh_list_lock to protect the state change of rose_neigh in rose_link_failed(), which is well synchronized. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> --- Changes in v6: - Change sk_for_each() to sk_for_each_safe(). - Change type of 'neighbour->use' from unsigned short to atomic_t. include/net/rose.h | 2 +- net/rose/af_rose.c | 19 +++++++++++++------ net/rose/rose_in.c | 12 ++++++------ net/rose/rose_route.c | 24 ++++++++++++------------ net/rose/rose_timer.c | 2 +- 5 files changed, 33 insertions(+), 26 deletions(-)