diff mbox series

[v2] net/sched: Set the flushing flags to false to prevent an infinite loop and add one test to tdc

Message ID 20230609094659.4004660-1-renmingshuai@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2] net/sched: Set the flushing flags to false to prevent an infinite loop and add one test to tdc | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

renmingshuai June 9, 2023, 9:46 a.m. UTC
When a new chain is added by using tc, one soft lockup alarm will be
 generated after delete the prio 0 filter of the chain. To reproduce
 the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:


The refcnt of the chain added by step 2 is equal to 1. After step 3,
 the flushing flag of the chain is set to true in the tcf_chain_flush()
 called by tc_del_tfilter() because the prio is 0. In this case, if
 we add a new filter to this chain, it will never succeed and try again
 and again because the refresh flash is always true and refcnt is 1.
 A soft lock alarm is generated 20 seconds later.
The stack is show as below:

Kernel panic - not syncing: softlockup: hung tasks
CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <IRQ>
 dump_stack+0x57/0x6e
 panic+0x196/0x3ec
 watchdog_timer_fn.cold+0x16/0x5c
 __run_hrtimer+0x5e/0x190
 __hrtimer_run_queues+0x8a/0xe0
 hrtimer_interrupt+0x110/0x2c0
 ? irqtime_account_irq+0x49/0xf0
 __sysvec_apic_timer_interrupt+0x5f/0xe0
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 sysvec_apic_timer_interrupt+0x72/0x80
 asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:mutex_lock+0x24/0x70
RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
 __tcf_chain_put+0x27/0x200
 tc_new_tfilter+0x5e8/0x810
 ? tc_setup_cb_add+0x210/0x210
 rtnetlink_rcv_msg+0x2e3/0x380
 ? rtnl_calcit.isra.0+0x120/0x120
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x12d/0x1d0
 netlink_sendmsg+0x286/0x490
 sock_sendmsg+0x62/0x70
 ____sys_sendmsg+0x24c/0x2c0
 ? import_iovec+0x17/0x20
 ? sendmsg_copy_msghdr+0x80/0xa0
 ___sys_sendmsg+0x75/0xc0
 ? do_fault_around+0x118/0x160
 ? do_read_fault+0x68/0xf0
 ? __handle_mm_fault+0x3f9/0x6f0
 __sys_sendmsg+0x59/0xa0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f96705b8247
RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00

To avoid this case, set chain->flushing to be false if the chain->refcnt
 is 2 after flushing the chain when prio is 0. I also add one test to tdc.

Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
Reviewed-by: Aichun Li <Liaichun@huawei.com>
---
V1 -> V2:
  * Correct the judgment on the value chain->refcnt and add one test to tdc
---
 net/sched/cls_api.c                           |  7 ++++++
 .../tc-testing/tc-tests/infra/filter.json     | 25 +++++++++++++++++++
 2 files changed, 32 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/infra/filter.json

Comments

Vlad Buslov June 9, 2023, 2:35 p.m. UTC | #1
On Fri 09 Jun 2023 at 17:46, renmingshuai <renmingshuai@huawei.com> wrote:
> When a new chain is added by using tc, one soft lockup alarm will be
>  generated after delete the prio 0 filter of the chain. To reproduce
>  the problem, perform the following steps:
> (1) tc qdisc add dev eth0 root handle 1: htb default 1
> (2) tc chain add dev eth0
> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
> (4) tc filter add dev eth0 chain 0 parent 1:
>
>
> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>  the flushing flag of the chain is set to true in the tcf_chain_flush()
>  called by tc_del_tfilter() because the prio is 0. In this case, if
>  we add a new filter to this chain, it will never succeed and try again
>  and again because the refresh flash is always true and refcnt is 1.
>  A soft lock alarm is generated 20 seconds later.

Hi Mingshuai,

Thanks for investigating and reporting this!

> The stack is show as below:
>
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Call Trace:
>  <IRQ>
>  dump_stack+0x57/0x6e
>  panic+0x196/0x3ec
>  watchdog_timer_fn.cold+0x16/0x5c
>  __run_hrtimer+0x5e/0x190
>  __hrtimer_run_queues+0x8a/0xe0
>  hrtimer_interrupt+0x110/0x2c0
>  ? irqtime_account_irq+0x49/0xf0
>  __sysvec_apic_timer_interrupt+0x5f/0xe0
>  asm_call_irq_on_stack+0x12/0x20
>  </IRQ>
>  sysvec_apic_timer_interrupt+0x72/0x80
>  asm_sysvec_apic_timer_interrupt+0x12/0x20
> RIP: 0010:mutex_lock+0x24/0x70
> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>  __tcf_chain_put+0x27/0x200
>  tc_new_tfilter+0x5e8/0x810
>  ? tc_setup_cb_add+0x210/0x210
>  rtnetlink_rcv_msg+0x2e3/0x380
>  ? rtnl_calcit.isra.0+0x120/0x120
>  netlink_rcv_skb+0x50/0x100
>  netlink_unicast+0x12d/0x1d0
>  netlink_sendmsg+0x286/0x490
>  sock_sendmsg+0x62/0x70
>  ____sys_sendmsg+0x24c/0x2c0
>  ? import_iovec+0x17/0x20
>  ? sendmsg_copy_msghdr+0x80/0xa0
>  ___sys_sendmsg+0x75/0xc0
>  ? do_fault_around+0x118/0x160
>  ? do_read_fault+0x68/0xf0
>  ? __handle_mm_fault+0x3f9/0x6f0
>  __sys_sendmsg+0x59/0xa0
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> RIP: 0033:0x7f96705b8247
> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>
> To avoid this case, set chain->flushing to be false if the chain->refcnt
>  is 2 after flushing the chain when prio is 0. I also add one test to tdc.
>
> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
> Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
> Reviewed-by: Aichun Li <Liaichun@huawei.com>
> ---
> V1 -> V2:
>   * Correct the judgment on the value chain->refcnt and add one test to tdc
> ---
>  net/sched/cls_api.c                           |  7 ++++++
>  .../tc-testing/tc-tests/infra/filter.json     | 25 +++++++++++++++++++
>  2 files changed, 32 insertions(+)
>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 2621550bfddc..3ea054e03fbf 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>  		tfilter_notify_chain(net, skb, block, q, parent, n,
>  				     chain, RTM_DELTFILTER, extack);
>  		tcf_chain_flush(chain, rtnl_held);
> +		/* Set the flushing flags to false to prevent an infinite loop
> +		 * when a new filter is added.
> +		 */
> +		mutex_lock(&chain->filter_chain_lock);
> +		if (chain->refcnt == 2)
> +			chain->flushing = false;
> +		mutex_unlock(&chain->filter_chain_lock);

I don't think this check is enough since there can be concurrent filter
ops holding the reference to the chain. This just makes the issue harder
to reproduce.

I'll try to formulate a fix that takes any potential concurrent users
into account and verify it with your test.

>  		err = 0;
>  		goto errout;
>  	}
> diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json b/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
> new file mode 100644
> index 000000000000..db3b42aaa4fa
> --- /dev/null
> +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
> @@ -0,0 +1,25 @@
> +[
> +    {
> +        "id": "c2b4",
> +        "name": "Adding a new filter after flushing empty chain doesn't cause an infinite loop",
> +        "category": [
> +            "filter",
> +            "chain"
> +        ],
> +        "setup": [
> +            "$IP link add dev $DUMMY type dummy || /bin/true",
> +            "$TC qdisc add dev $DUMMY root handle 1: htb default 1",
> +            "$TC chain add dev $DUMMY",
> +            "$TC filter del dev $DUMMY chain 0 parent 1: prio 0"
> +        ],
> +        "cmdUnderTest": "$TC filter add dev $DUMMY chain 0 parent 1:",
> +        "expExitCode": "2",
> +        "verifyCmd": "$TC chain ls dev $DUMMY",
> +        "matchPattern": "chain parent 1: chain 0",
> +        "matchCount": "1",
> +        "teardown": [
> +            "$TC qdisc del dev $DUMMY root handle 1: htb default 1",
> +            "$IP link del dev $DUMMY type dummy"
> +        ]
> +    }
> +]
renmingshuai June 12, 2023, 2:29 p.m. UTC | #2
>On Fri 09 Jun 2023 at 17:46, renmingshuai <renmingshuai@huawei.com> wrote:
>> When a new chain is added by using tc, one soft lockup alarm will be
>>  generated after delete the prio 0 filter of the chain. To reproduce
>>  the problem, perform the following steps:
>> (1) tc qdisc add dev eth0 root handle 1: htb default 1
>> (2) tc chain add dev eth0
>> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
>> (4) tc filter add dev eth0 chain 0 parent 1:
>>
>>
>> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>>  the flushing flag of the chain is set to true in the tcf_chain_flush()
>>  called by tc_del_tfilter() because the prio is 0. In this case, if
>>  we add a new filter to this chain, it will never succeed and try again
>>  and again because the refresh flash is always true and refcnt is 1.
>>  A soft lock alarm is generated 20 seconds later.
>
>Hi Mingshuai,
>
>Thanks for investigating and reporting this!
>
>> The stack is show as below:
>>
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>> Call Trace:
>>  <IRQ>
>>  dump_stack+0x57/0x6e
>>  panic+0x196/0x3ec
>>  watchdog_timer_fn.cold+0x16/0x5c
>>  __run_hrtimer+0x5e/0x190
>>  __hrtimer_run_queues+0x8a/0xe0
>>  hrtimer_interrupt+0x110/0x2c0
>>  ? irqtime_account_irq+0x49/0xf0
>>  __sysvec_apic_timer_interrupt+0x5f/0xe0
>>  asm_call_irq_on_stack+0x12/0x20
>>  </IRQ>
>>  sysvec_apic_timer_interrupt+0x72/0x80
>>  asm_sysvec_apic_timer_interrupt+0x12/0x20
>> RIP: 0010:mutex_lock+0x24/0x70
>> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
>> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
>> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
>> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
>> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>>  __tcf_chain_put+0x27/0x200
>>  tc_new_tfilter+0x5e8/0x810
>>  ? tc_setup_cb_add+0x210/0x210
>>  rtnetlink_rcv_msg+0x2e3/0x380
>>  ? rtnl_calcit.isra.0+0x120/0x120
>>  netlink_rcv_skb+0x50/0x100
>>  netlink_unicast+0x12d/0x1d0
>>  netlink_sendmsg+0x286/0x490
>>  sock_sendmsg+0x62/0x70
>>  ____sys_sendmsg+0x24c/0x2c0
>>  ? import_iovec+0x17/0x20
>>  ? sendmsg_copy_msghdr+0x80/0xa0
>>  ___sys_sendmsg+0x75/0xc0
>>  ? do_fault_around+0x118/0x160
>>  ? do_read_fault+0x68/0xf0
>>  ? __handle_mm_fault+0x3f9/0x6f0
>>  __sys_sendmsg+0x59/0xa0
>>  do_syscall_64+0x33/0x40
>>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
>> RIP: 0033:0x7f96705b8247
>> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
>> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
>> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
>> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
>> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>>
>> To avoid this case, set chain->flushing to be false if the chain->refcnt
>>  is 2 after flushing the chain when prio is 0. I also add one test to tdc.
>>
>> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
>> Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
>> Reviewed-by: Aichun Li <Liaichun@huawei.com>
>> ---
>> V1 -> V2:
>>   * Correct the judgment on the value chain->refcnt and add one test to tdc
>> ---
>>  net/sched/cls_api.c                           |  7 ++++++
>>  .../tc-testing/tc-tests/infra/filter.json     | 25 +++++++++++++++++++
>>  2 files changed, 32 insertions(+)
>>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 2621550bfddc..3ea054e03fbf 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>>  		tfilter_notify_chain(net, skb, block, q, parent, n,
>>  				     chain, RTM_DELTFILTER, extack);
>>  		tcf_chain_flush(chain, rtnl_held);
>> +		/* Set the flushing flags to false to prevent an infinite loop
>> +		 * when a new filter is added.
>> +		 */
>> +		mutex_lock(&chain->filter_chain_lock);
>> +		if (chain->refcnt == 2)
>> +			chain->flushing = false;
>> +		mutex_unlock(&chain->filter_chain_lock);
>
>I don't think this check is enough since there can be concurrent filter
>ops holding the reference to the chain. This just makes the issue harder
>to reproduce.
>
>I'll try to formulate a fix that takes any potential concurrent users
>into account and verify it with your test.

Thanks for your reply.
I didn't find any concurrent scenarios that would "escape" this check.
That's my understanding.
During the flush operation, after filter_chain_lock is obtained, no new chain reference
 could be added. After unlock, chain->flushing is set to true. The chain->refcnt and
 chain->action_refcnt are always different. Therefore, chain->flushing will be never set to false.
 Then there seems to be no concurrency problem until flushing is set to true.
would you mind tell me the concurrent scenario you're talking about?
Vlad Buslov June 12, 2023, 2:48 p.m. UTC | #3
On Mon 12 Jun 2023 at 22:29, renmingshuai <renmingshuai@huawei.com> wrote:
>>On Fri 09 Jun 2023 at 17:46, renmingshuai <renmingshuai@huawei.com> wrote:
>>> When a new chain is added by using tc, one soft lockup alarm will be
>>>  generated after delete the prio 0 filter of the chain. To reproduce
>>>  the problem, perform the following steps:
>>> (1) tc qdisc add dev eth0 root handle 1: htb default 1
>>> (2) tc chain add dev eth0
>>> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
>>> (4) tc filter add dev eth0 chain 0 parent 1:
>>>
>>>
>>> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>>>  the flushing flag of the chain is set to true in the tcf_chain_flush()
>>>  called by tc_del_tfilter() because the prio is 0. In this case, if
>>>  we add a new filter to this chain, it will never succeed and try again
>>>  and again because the refresh flash is always true and refcnt is 1.
>>>  A soft lock alarm is generated 20 seconds later.
>>
>>Hi Mingshuai,
>>
>>Thanks for investigating and reporting this!
>>
>>> The stack is show as below:
>>>
>>> Kernel panic - not syncing: softlockup: hung tasks
>>> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>> Call Trace:
>>>  <IRQ>
>>>  dump_stack+0x57/0x6e
>>>  panic+0x196/0x3ec
>>>  watchdog_timer_fn.cold+0x16/0x5c
>>>  __run_hrtimer+0x5e/0x190
>>>  __hrtimer_run_queues+0x8a/0xe0
>>>  hrtimer_interrupt+0x110/0x2c0
>>>  ? irqtime_account_irq+0x49/0xf0
>>>  __sysvec_apic_timer_interrupt+0x5f/0xe0
>>>  asm_call_irq_on_stack+0x12/0x20
>>>  </IRQ>
>>>  sysvec_apic_timer_interrupt+0x72/0x80
>>>  asm_sysvec_apic_timer_interrupt+0x12/0x20
>>> RIP: 0010:mutex_lock+0x24/0x70
>>> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
>>> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
>>> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
>>> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
>>> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
>>> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>>>  __tcf_chain_put+0x27/0x200
>>>  tc_new_tfilter+0x5e8/0x810
>>>  ? tc_setup_cb_add+0x210/0x210
>>>  rtnetlink_rcv_msg+0x2e3/0x380
>>>  ? rtnl_calcit.isra.0+0x120/0x120
>>>  netlink_rcv_skb+0x50/0x100
>>>  netlink_unicast+0x12d/0x1d0
>>>  netlink_sendmsg+0x286/0x490
>>>  sock_sendmsg+0x62/0x70
>>>  ____sys_sendmsg+0x24c/0x2c0
>>>  ? import_iovec+0x17/0x20
>>>  ? sendmsg_copy_msghdr+0x80/0xa0
>>>  ___sys_sendmsg+0x75/0xc0
>>>  ? do_fault_around+0x118/0x160
>>>  ? do_read_fault+0x68/0xf0
>>>  ? __handle_mm_fault+0x3f9/0x6f0
>>>  __sys_sendmsg+0x59/0xa0
>>>  do_syscall_64+0x33/0x40
>>>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
>>> RIP: 0033:0x7f96705b8247
>>> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
>>> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
>>> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
>>> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
>>> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>>>
>>> To avoid this case, set chain->flushing to be false if the chain->refcnt
>>>  is 2 after flushing the chain when prio is 0. I also add one test to tdc.
>>>
>>> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
>>> Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
>>> Reviewed-by: Aichun Li <Liaichun@huawei.com>
>>> ---
>>> V1 -> V2:
>>>   * Correct the judgment on the value chain->refcnt and add one test to tdc
>>> ---
>>>  net/sched/cls_api.c                           |  7 ++++++
>>>  .../tc-testing/tc-tests/infra/filter.json     | 25 +++++++++++++++++++
>>>  2 files changed, 32 insertions(+)
>>>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 2621550bfddc..3ea054e03fbf 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>>>  		tfilter_notify_chain(net, skb, block, q, parent, n,
>>>  				     chain, RTM_DELTFILTER, extack);
>>>  		tcf_chain_flush(chain, rtnl_held);
>>> +		/* Set the flushing flags to false to prevent an infinite loop
>>> +		 * when a new filter is added.
>>> +		 */
>>> +		mutex_lock(&chain->filter_chain_lock);
>>> +		if (chain->refcnt == 2)
>>> +			chain->flushing = false;
>>> +		mutex_unlock(&chain->filter_chain_lock);
>>
>>I don't think this check is enough since there can be concurrent filter
>>ops holding the reference to the chain. This just makes the issue harder
>>to reproduce.
>>
>>I'll try to formulate a fix that takes any potential concurrent users
>>into account and verify it with your test.
>
> Thanks for your reply.
> I didn't find any concurrent scenarios that would "escape" this check.
> That's my understanding.
> During the flush operation, after filter_chain_lock is obtained, no new chain reference
>  could be added. After unlock, chain->flushing is set to true. The chain->refcnt and
>  chain->action_refcnt are always different. Therefore, chain->flushing will be never set to false.

I agree with your analysis.

>  Then there seems to be no concurrency problem until flushing is set to true.
> would you mind tell me the concurrent scenario you're talking about?

What if, for example, chain->refcnt==3 after flush because there is
another task inserting filters to the same chain concurrently? In such
case for chains explicitly created with chains API, chain->flushing flag
will never be reset afterwards and user will get the same lockup when
trying to install any following filters. Also, refcnt==2 only means that
there is no concurrent users when flushing explicitly created chains
(because chains API holds a "passive" reference to such chains). For
regular chains implicitly created by filters API refcnt==2 here means
that there are active concurrent users.
renmingshuai June 13, 2023, 7:02 a.m. UTC | #4
>On Mon 12 Jun 2023 at 22:29, renmingshuai <renmingshuai@huawei.com> wrote:
>>>On Fri 09 Jun 2023 at 17:46, renmingshuai <renmingshuai@huawei.com> wrote:
>>>> When a new chain is added by using tc, one soft lockup alarm will be
>>>>  generated after delete the prio 0 filter of the chain. To reproduce
>>>>  the problem, perform the following steps:
>>>> (1) tc qdisc add dev eth0 root handle 1: htb default 1
>>>> (2) tc chain add dev eth0
>>>> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
>>>> (4) tc filter add dev eth0 chain 0 parent 1:
>>>>
>>>>
>>>> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>>>>  the flushing flag of the chain is set to true in the tcf_chain_flush()
>>>>  called by tc_del_tfilter() because the prio is 0. In this case, if
>>>>  we add a new filter to this chain, it will never succeed and try again
>>>>  and again because the refresh flash is always true and refcnt is 1.
>>>>  A soft lock alarm is generated 20 seconds later.
>>>
>>>Hi Mingshuai,
>>>
>>>Thanks for investigating and reporting this!
>>>
>>>> The stack is show as below:
>>>>
>>>> Kernel panic - not syncing: softlockup: hung tasks
>>>> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>>> Call Trace:
>>>>  <IRQ>
>>>>  dump_stack+0x57/0x6e
>>>>  panic+0x196/0x3ec
>>>>  watchdog_timer_fn.cold+0x16/0x5c
>>>>  __run_hrtimer+0x5e/0x190
>>>>  __hrtimer_run_queues+0x8a/0xe0
>>>>  hrtimer_interrupt+0x110/0x2c0
>>>>  ? irqtime_account_irq+0x49/0xf0
>>>>  __sysvec_apic_timer_interrupt+0x5f/0xe0
>>>>  asm_call_irq_on_stack+0x12/0x20
>>>>  </IRQ>
>>>>  sysvec_apic_timer_interrupt+0x72/0x80
>>>>  asm_sysvec_apic_timer_interrupt+0x12/0x20
>>>> RIP: 0010:mutex_lock+0x24/0x70
>>>> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
>>>> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
>>>> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
>>>> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
>>>> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
>>>> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>>>>  __tcf_chain_put+0x27/0x200
>>>>  tc_new_tfilter+0x5e8/0x810
>>>>  ? tc_setup_cb_add+0x210/0x210
>>>>  rtnetlink_rcv_msg+0x2e3/0x380
>>>>  ? rtnl_calcit.isra.0+0x120/0x120
>>>>  netlink_rcv_skb+0x50/0x100
>>>>  netlink_unicast+0x12d/0x1d0
>>>>  netlink_sendmsg+0x286/0x490
>>>>  sock_sendmsg+0x62/0x70
>>>>  ____sys_sendmsg+0x24c/0x2c0
>>>>  ? import_iovec+0x17/0x20
>>>>  ? sendmsg_copy_msghdr+0x80/0xa0
>>>>  ___sys_sendmsg+0x75/0xc0
>>>>  ? do_fault_around+0x118/0x160
>>>>  ? do_read_fault+0x68/0xf0
>>>>  ? __handle_mm_fault+0x3f9/0x6f0
>>>>  __sys_sendmsg+0x59/0xa0
>>>>  do_syscall_64+0x33/0x40
>>>>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
>>>> RIP: 0033:0x7f96705b8247
>>>> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
>>>> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
>>>> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
>>>> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
>>>> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>>>>
>>>> To avoid this case, set chain->flushing to be false if the chain->refcnt
>>>>  is 2 after flushing the chain when prio is 0. I also add one test to tdc.
>>>>
>>>> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
>>>> Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com>
>>>> Reviewed-by: Aichun Li <Liaichun@huawei.com>
>>>> ---
>>>> V1 -> V2:
>>>>   * Correct the judgment on the value chain->refcnt and add one test to tdc
>>>> ---
>>>>  net/sched/cls_api.c                           |  7 ++++++
>>>>  .../tc-testing/tc-tests/infra/filter.json     | 25 +++++++++++++++++++
>>>>  2 files changed, 32 insertions(+)
>>>>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
>>>>
>>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>> index 2621550bfddc..3ea054e03fbf 100644
>>>> --- a/net/sched/cls_api.c
>>>> +++ b/net/sched/cls_api.c
>>>> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>>>>  		tfilter_notify_chain(net, skb, block, q, parent, n,
>>>>  				     chain, RTM_DELTFILTER, extack);
>>>>  		tcf_chain_flush(chain, rtnl_held);
>>>> +		/* Set the flushing flags to false to prevent an infinite loop
>>>> +		 * when a new filter is added.
>>>> +		 */
>>>> +		mutex_lock(&chain->filter_chain_lock);
>>>> +		if (chain->refcnt == 2)
>>>> +			chain->flushing = false;
>>>> +		mutex_unlock(&chain->filter_chain_lock);
>>>
>>>I don't think this check is enough since there can be concurrent filter
>>>ops holding the reference to the chain. This just makes the issue harder
>>>to reproduce.
>>>
>>>I'll try to formulate a fix that takes any potential concurrent users
>>>into account and verify it with your test.
>>
>> Thanks for your reply.
>> I didn't find any concurrent scenarios that would "escape" this check.
>> That's my understanding.
>> During the flush operation, after filter_chain_lock is obtained, no new chain reference
>>  could be added. After unlock, chain->flushing is set to true. The chain->refcnt and
>>  chain->action_refcnt are always different. Therefore, chain->flushing will be never set to false.
>
>I agree with your analysis.
>
>>  Then there seems to be no concurrency problem until flushing is set to true.
>> would you mind tell me the concurrent scenario you're talking about?
>
>What if, for example, chain->refcnt==3 after flush because there is
>another task inserting filters to the same chain concurrently? In such
>case for chains explicitly created with chains API, chain->flushing flag
>will never be reset afterwards and user will get the same lockup when
>trying to install any following filters. Also, refcnt==2 only means that
>there is no concurrent users when flushing explicitly created chains
>(because chains API holds a "passive" reference to such chains). For
>regular chains implicitly created by filters API refcnt==2 here means
>that there are active concurrent users.

Thanks, you are right.
My previous insistence that the chain->refcnt would only be increased if the filter was successfully
 inserted into the chain was clearly wrong.
diff mbox series

Patch

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2621550bfddc..3ea054e03fbf 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2442,6 +2442,13 @@  static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tfilter_notify_chain(net, skb, block, q, parent, n,
 				     chain, RTM_DELTFILTER, extack);
 		tcf_chain_flush(chain, rtnl_held);
+		/* Set the flushing flags to false to prevent an infinite loop
+		 * when a new filter is added.
+		 */
+		mutex_lock(&chain->filter_chain_lock);
+		if (chain->refcnt == 2)
+			chain->flushing = false;
+		mutex_unlock(&chain->filter_chain_lock);
 		err = 0;
 		goto errout;
 	}
diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json b/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
new file mode 100644
index 000000000000..db3b42aaa4fa
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/filter.json
@@ -0,0 +1,25 @@ 
+[
+    {
+        "id": "c2b4",
+        "name": "Adding a new filter after flushing empty chain doesn't cause an infinite loop",
+        "category": [
+            "filter",
+            "chain"
+        ],
+        "setup": [
+            "$IP link add dev $DUMMY type dummy || /bin/true",
+            "$TC qdisc add dev $DUMMY root handle 1: htb default 1",
+            "$TC chain add dev $DUMMY",
+            "$TC filter del dev $DUMMY chain 0 parent 1: prio 0"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DUMMY chain 0 parent 1:",
+        "expExitCode": "2",
+        "verifyCmd": "$TC chain ls dev $DUMMY",
+        "matchPattern": "chain parent 1: chain 0",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY root handle 1: htb default 1",
+            "$IP link del dev $DUMMY type dummy"
+        ]
+    }
+]