diff mbox series

[net,v2] virtio-net: fix possible dim status unrecoverable

Message ID 1711434338-64848-1-git-send-email-hengqi@linux.alibaba.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] virtio-net: fix possible dim status unrecoverable | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 944 this patch: 944
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 955 this patch: 955
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 955 this patch: 955
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-27--15-00 (tests: 952)

Commit Message

Heng Qi March 26, 2024, 6:25 a.m. UTC
When the dim worker is scheduled, if it fails to acquire the lock,
dim may not be able to return to the working state later.

For example, the following single queue scenario:
  1. The dim worker of rxq0 is scheduled, and the dim status is
     changed to DIM_APPLY_NEW_PROFILE;
  2. The ethtool command is holding rtnl lock;
  3. Since the rtnl lock is already held, virtnet_rx_dim_work fails
     to acquire the lock and exits;

Then, even if net_dim is invoked again, it cannot work because the
state is not restored to DIM_START_MEASURE.

Patch has been tested on a VM with 16 NICs, 128 queues per NIC
(2kq total):
With dim enabled on all queues, there are many opportunities for
contention for RTNL lock, and this patch introduces no visible hotspots.
The dim performance is also stable.

Fixes: 6208799553a8 ("virtio-net: support rx netdim")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v1->v2:
  - Update commit log. No functional changes.

 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Paolo Abeni March 28, 2024, 10:34 a.m. UTC | #1
On Tue, 2024-03-26 at 14:25 +0800, Heng Qi wrote:
> When the dim worker is scheduled, if it fails to acquire the lock,
> dim may not be able to return to the working state later.
> 
> For example, the following single queue scenario:
>   1. The dim worker of rxq0 is scheduled, and the dim status is
>      changed to DIM_APPLY_NEW_PROFILE;
>   2. The ethtool command is holding rtnl lock;
>   3. Since the rtnl lock is already held, virtnet_rx_dim_work fails
>      to acquire the lock and exits;
> 
> Then, even if net_dim is invoked again, it cannot work because the
> state is not restored to DIM_START_MEASURE.
> 
> Patch has been tested on a VM with 16 NICs, 128 queues per NIC
> (2kq total):
> With dim enabled on all queues, there are many opportunities for
> contention for RTNL lock, and this patch introduces no visible hotspots.
> The dim performance is also stable.
> 
> Fixes: 6208799553a8 ("virtio-net: support rx netdim")
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
> v1->v2:
>   - Update commit log. No functional changes.
> 
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c22d111..0ebe322 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work)
>  	struct dim_cq_moder update_moder;
>  	int i, qnum, err;
>  
> -	if (!rtnl_trylock())
> +	if (!rtnl_trylock()) {
> +		schedule_work(&dim->work);
>  		return;

I'm really scared by this change. VMs are (increasingly) used to run
containers orchestration, which in turns puts a lot of pressure on the
RTNL lock. Any rtnl_trylock+ reschedule may hang for a very long time.
Addressing this kind of issues later becomes _extremely_ painful, see:

https://lore.kernel.org/netdev/20231018154804.420823-1-atenart@kernel.org/

I really think a different solution is needed. What about moving
virtnet_send_command() under protection of a new mutex?

I understand it will complicate future hardening works around cvq, but
really rtnl_trylock()/<spin/retry> is bad for the whole system.

Cheers,

Paolo
Heng Qi March 29, 2024, 2:19 a.m. UTC | #2
在 2024/3/28 下午6:34, Paolo Abeni 写道:
> On Tue, 2024-03-26 at 14:25 +0800, Heng Qi wrote:
>> When the dim worker is scheduled, if it fails to acquire the lock,
>> dim may not be able to return to the working state later.
>>
>> For example, the following single queue scenario:
>>    1. The dim worker of rxq0 is scheduled, and the dim status is
>>       changed to DIM_APPLY_NEW_PROFILE;
>>    2. The ethtool command is holding rtnl lock;
>>    3. Since the rtnl lock is already held, virtnet_rx_dim_work fails
>>       to acquire the lock and exits;
>>
>> Then, even if net_dim is invoked again, it cannot work because the
>> state is not restored to DIM_START_MEASURE.
>>
>> Patch has been tested on a VM with 16 NICs, 128 queues per NIC
>> (2kq total):
>> With dim enabled on all queues, there are many opportunities for
>> contention for RTNL lock, and this patch introduces no visible hotspots.
>> The dim performance is also stable.
>>
>> Fixes: 6208799553a8 ("virtio-net: support rx netdim")
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> ---
>> v1->v2:
>>    - Update commit log. No functional changes.
>>
>>   drivers/net/virtio_net.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index c22d111..0ebe322 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work)
>>   	struct dim_cq_moder update_moder;
>>   	int i, qnum, err;
>>   
>> -	if (!rtnl_trylock())
>> +	if (!rtnl_trylock()) {
>> +		schedule_work(&dim->work);
>>   		return;
> I'm really scared by this change. VMs are (increasingly) used to run
> containers orchestration, which in turns puts a lot of pressure on the
> RTNL lock. Any rtnl_trylock+ reschedule may hang for a very long time.
> Addressing this kind of issues later becomes _extremely_ painful, see:
>
> https://lore.kernel.org/netdev/20231018154804.420823-1-atenart@kernel.org/
>
> I really think a different solution is needed. What about moving
> virtnet_send_command() under protection of a new mutex?

Daniel did additional work:

https://lore.kernel.org/all/20240328044715.266641-1-danielj@nvidia.com/

Use spin lock to protect ctrlq access, therefore, rtnl lock can be 
removed in rx_dim_work,
which will make the problem non-existent.

Thanks,
Heng

>
> I understand it will complicate future hardening works around cvq, but
> really rtnl_trylock()/<spin/retry> is bad for the whole system.
>
> Cheers,
>
> Paolo
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d111..0ebe322 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3563,8 +3563,10 @@  static void virtnet_rx_dim_work(struct work_struct *work)
 	struct dim_cq_moder update_moder;
 	int i, qnum, err;
 
-	if (!rtnl_trylock())
+	if (!rtnl_trylock()) {
+		schedule_work(&dim->work);
 		return;
+	}
 
 	/* Each rxq's work is queued by "net_dim()->schedule_work()"
 	 * in response to NAPI traffic changes. Note that dim->profile_ix