diff mbox series

[net,v3] net/smc: fix wrong list_del in smc_lgr_cleanup_early

Message ID 20211201030230.8896-1-dust.li@linux.alibaba.com (mailing list archive)
State Accepted
Commit 789b6cc2a5f9123b9c549b886fdc47c865cfe0ba
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net/smc: fix wrong list_del in smc_lgr_cleanup_early | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Dust Li Dec. 1, 2021, 3:02 a.m. UTC
smc_lgr_cleanup_early() meant to delete the link
group from the link group list, but it deleted
the list head by mistake.

This may cause memory corruption since we didn't
remove the real link group from the list and later
memseted the link group structure.
We got a list corruption panic when testing:

[  231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000
[  231.278222] ------------[ cut here ]------------
[  231.278726] kernel BUG at lib/list_debug.c:53!
[  231.279326] invalid opcode: 0000 [#1] SMP NOPTI
[  231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435
[  231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[  231.281248] Workqueue: events smc_link_down_work
[  231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90
[  231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c
60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f>
0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc
[  231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292
[  231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000
[  231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040
[  231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001
[  231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001
[  231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003
[  231.288337] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[  231.289160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0
[  231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  231.291940] Call Trace:
[  231.292211]  smc_lgr_terminate_sched+0x53/0xa0
[  231.292677]  smc_switch_conns+0x75/0x6b0
[  231.293085]  ? update_load_avg+0x1a6/0x590
[  231.293517]  ? ttwu_do_wakeup+0x17/0x150
[  231.293907]  ? update_load_avg+0x1a6/0x590
[  231.294317]  ? newidle_balance+0xca/0x3d0
[  231.294716]  smcr_link_down+0x50/0x1a0
[  231.295090]  ? __wake_up_common_lock+0x77/0x90
[  231.295534]  smc_link_down_work+0x46/0x60
[  231.295933]  process_one_work+0x18b/0x350

Fixes: a0a62ee15a829 ("net/smc: separate locks for SMCD and SMCR link group lists")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
---
v2: Remove unused lgr_list
v2->v3: Fix uninitialized lgr_lock, thanks Jakub Kicinski !
---
 net/smc/smc_core.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Tony Lu Dec. 1, 2021, 6:34 a.m. UTC | #1
On Wed, Dec 01, 2021 at 11:02:30AM +0800, Dust Li wrote:
> smc_lgr_cleanup_early() meant to delete the link
> group from the link group list, but it deleted
> the list head by mistake.
> 
> This may cause memory corruption since we didn't
> remove the real link group from the list and later
> memseted the link group structure.
> We got a list corruption panic when testing:
> 
> [  231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000
> [  231.278222] ------------[ cut here ]------------
> [  231.278726] kernel BUG at lib/list_debug.c:53!
> [  231.279326] invalid opcode: 0000 [#1] SMP NOPTI
> [  231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435
> [  231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
> [  231.281248] Workqueue: events smc_link_down_work
> [  231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90
> [  231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c
> 60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f>
> 0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc
> [  231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292
> [  231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000
> [  231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040
> [  231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001
> [  231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001
> [  231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003
> [  231.288337] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
> [  231.289160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0
> [  231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  231.291940] Call Trace:
> [  231.292211]  smc_lgr_terminate_sched+0x53/0xa0
> [  231.292677]  smc_switch_conns+0x75/0x6b0
> [  231.293085]  ? update_load_avg+0x1a6/0x590
> [  231.293517]  ? ttwu_do_wakeup+0x17/0x150
> [  231.293907]  ? update_load_avg+0x1a6/0x590
> [  231.294317]  ? newidle_balance+0xca/0x3d0
> [  231.294716]  smcr_link_down+0x50/0x1a0
> [  231.295090]  ? __wake_up_common_lock+0x77/0x90
> [  231.295534]  smc_link_down_work+0x46/0x60
> [  231.295933]  process_one_work+0x18b/0x350
> 
> Fixes: a0a62ee15a829 ("net/smc: separate locks for SMCD and SMCR link group lists")
> Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
> Acked-by: Karsten Graul <kgraul@linux.ibm.com>

This patch looks good to me, thank you.

Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>

> ---
> v2: Remove unused lgr_list
> v2->v3: Fix uninitialized lgr_lock, thanks Jakub Kicinski !
> ---
>  net/smc/smc_core.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index bb52c8b5f148..387d28b2f8dd 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -625,18 +625,17 @@ int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb)
>  void smc_lgr_cleanup_early(struct smc_connection *conn)
>  {
>  	struct smc_link_group *lgr = conn->lgr;
> -	struct list_head *lgr_list;
>  	spinlock_t *lgr_lock;
>  
>  	if (!lgr)
>  		return;
>  
>  	smc_conn_free(conn);
> -	lgr_list = smc_lgr_list_head(lgr, &lgr_lock);
> +	smc_lgr_list_head(lgr, &lgr_lock);
>  	spin_lock_bh(lgr_lock);
>  	/* do not use this link group for new connections */
> -	if (!list_empty(lgr_list))
> -		list_del_init(lgr_list);
> +	if (!list_empty(&lgr->list))
> +		list_del_init(&lgr->list);
>  	spin_unlock_bh(lgr_lock);
>  	__smc_lgr_terminate(lgr, true);
>  }
> -- 
> 2.19.1.3.ge56e4f7
patchwork-bot+netdevbpf@kernel.org Dec. 2, 2021, 12:20 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed,  1 Dec 2021 11:02:30 +0800 you wrote:
> smc_lgr_cleanup_early() meant to delete the link
> group from the link group list, but it deleted
> the list head by mistake.
> 
> This may cause memory corruption since we didn't
> remove the real link group from the list and later
> memseted the link group structure.
> We got a list corruption panic when testing:
> 
> [...]

Here is the summary with links:
  - [net,v3] net/smc: fix wrong list_del in smc_lgr_cleanup_early
    https://git.kernel.org/netdev/net/c/789b6cc2a5f9

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index bb52c8b5f148..387d28b2f8dd 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -625,18 +625,17 @@  int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb)
 void smc_lgr_cleanup_early(struct smc_connection *conn)
 {
 	struct smc_link_group *lgr = conn->lgr;
-	struct list_head *lgr_list;
 	spinlock_t *lgr_lock;
 
 	if (!lgr)
 		return;
 
 	smc_conn_free(conn);
-	lgr_list = smc_lgr_list_head(lgr, &lgr_lock);
+	smc_lgr_list_head(lgr, &lgr_lock);
 	spin_lock_bh(lgr_lock);
 	/* do not use this link group for new connections */
-	if (!list_empty(lgr_list))
-		list_del_init(lgr_list);
+	if (!list_empty(&lgr->list))
+		list_del_init(&lgr->list);
 	spin_unlock_bh(lgr_lock);
 	__smc_lgr_terminate(lgr, true);
 }