diff mbox series

[net-next,v6,4/5] virtio-net: add spin lock for ctrl cmd access

Message ID 245ea32fe5de5eb81b1ed8ec9782023af074e137.1701762688.git.hengqi@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series virtio-net: support dynamic coalescing moderation | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl fail Tree is dirty after regen; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1115 this patch: 1115
netdev/cc_maintainers warning 1 maintainers not CCed: virtualization@lists.linux.dev
netdev/build_clang success Errors and warnings before: 1142 this patch: 1142
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1144 this patch: 1144
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 283 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Heng Qi Dec. 5, 2023, 8:02 a.m. UTC
Currently access to ctrl cmd is globally protected via rtnl_lock and works
fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
may occur due to cancel_work_sync for dim work. Therefore, treating
ctrl cmd as a separate protection object of the lock is the solution and
the basis for the next patch.

Since ndo_set_rx_mode is in an atomic environment(netif_addr_lock_bh),
the mutex lock is excluded.

And I tried putting the spin lock in virtnet_send_command, but
virtnet_rx_dim_work and virtnet_set_per_queue_coalesce access to
shared variables prevent this.

cancel_work_sync and virtnet_rx_dim_work are from the next patch.

Please review. Thanks a lot!

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 6 deletions(-)

Comments

Jason Wang Dec. 5, 2023, 8:35 a.m. UTC | #1
On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> may occur due to cancel_work_sync for dim work.

Can you explain why?

> Therefore, treating
> ctrl cmd as a separate protection object of the lock is the solution and
> the basis for the next patch.

Let's don't do that. Reasons are:

1) virtnet_send_command() may wait for cvq commands for an indefinite time
2) hold locks may complicate the future hardening works around cvq

Thanks
Heng Qi Dec. 5, 2023, 11:05 a.m. UTC | #2
在 2023/12/5 下午4:35, Jason Wang 写道:
> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>> may occur due to cancel_work_sync for dim work.
> Can you explain why?

For example, during the bus unbind operation, the following call stack 
occurs:
virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close -> 
cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).

>> Therefore, treating
>> ctrl cmd as a separate protection object of the lock is the solution and
>> the basis for the next patch.
> Let's don't do that. Reasons are:
>
> 1) virtnet_send_command() may wait for cvq commands for an indefinite time

Yes, I took that into consideration. But ndo_set_rx_mode's need for an 
atomic
environment rules out the mutex lock.

> 2) hold locks may complicate the future hardening works around cvq

Agree, but I don't seem to have thought of a better way besides passing 
the lock.
Do you have any other better ideas or suggestions?

Thanks!

>
> Thanks
Jason Wang Dec. 6, 2023, 9:11 a.m. UTC | #3
On Tue, Dec 5, 2023 at 7:06 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/5 下午4:35, Jason Wang 写道:
> > On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >> may occur due to cancel_work_sync for dim work.
> > Can you explain why?
>
> For example, during the bus unbind operation, the following call stack
> occurs:
> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).

Can we use rtnl_trylock() and reschedule the work?

>
> >> Therefore, treating
> >> ctrl cmd as a separate protection object of the lock is the solution and
> >> the basis for the next patch.
> > Let's don't do that. Reasons are:
> >
> > 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>
> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> atomic
> environment rules out the mutex lock.

It is a "bug" that we need to fix.

>
> > 2) hold locks may complicate the future hardening works around cvq
>
> Agree, but I don't seem to have thought of a better way besides passing
> the lock.
> Do you have any other better ideas or suggestions?

So far no, you can refer to the past discussions, it might require the
collaboration from the uAPI and stack.

Thanks

>
> Thanks!
>
> >
> > Thanks
>
Paolo Abeni Dec. 6, 2023, 12:27 p.m. UTC | #4
On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
> 
> 在 2023/12/5 下午4:35, Jason Wang 写道:
> > On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > Currently access to ctrl cmd is globally protected via rtnl_lock and works
> > > fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> > > may occur due to cancel_work_sync for dim work.
> > Can you explain why?
> 
> For example, during the bus unbind operation, the following call stack 
> occurs:
> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close -> 
> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
> 
> > > Therefore, treating
> > > ctrl cmd as a separate protection object of the lock is the solution and
> > > the basis for the next patch.
> > Let's don't do that. Reasons are:
> > 
> > 1) virtnet_send_command() may wait for cvq commands for an indefinite time
> 
> Yes, I took that into consideration. But ndo_set_rx_mode's need for an 
> atomic
> environment rules out the mutex lock.
> 
> > 2) hold locks may complicate the future hardening works around cvq
> 
> Agree, but I don't seem to have thought of a better way besides passing 
> the lock.
> Do you have any other better ideas or suggestions?

What about:

- using the rtnl lock only
- virtionet_close() invokes cancel_work(), without flushing the work
- virtnet_remove() calls flush_work() after unregister_netdev(),
outside the rtnl lock

Should prevent both the deadlock and the UaF.

Side note: for this specific case any functional test with a
CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
scenario above.

Cheers,

Paolo
Heng Qi Dec. 6, 2023, 1:03 p.m. UTC | #5
在 2023/12/6 下午8:27, Paolo Abeni 写道:
> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
>> 在 2023/12/5 下午4:35, Jason Wang 写道:
>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>>>> may occur due to cancel_work_sync for dim work.
>>> Can you explain why?
>> For example, during the bus unbind operation, the following call stack
>> occurs:
>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>>
>>>> Therefore, treating
>>>> ctrl cmd as a separate protection object of the lock is the solution and
>>>> the basis for the next patch.
>>> Let's don't do that. Reasons are:
>>>
>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
>> atomic
>> environment rules out the mutex lock.
>>
>>> 2) hold locks may complicate the future hardening works around cvq
>> Agree, but I don't seem to have thought of a better way besides passing
>> the lock.
>> Do you have any other better ideas or suggestions?
> What about:
>
> - using the rtnl lock only
> - virtionet_close() invokes cancel_work(), without flushing the work
> - virtnet_remove() calls flush_work() after unregister_netdev(),
> outside the rtnl lock
>
> Should prevent both the deadlock and the UaF.


Hi, Paolo and Jason!

Thank you very much for your effective suggestions, but I found another 
solution[1],
based on the ideas of rtnl_trylock and refill_work, which works very well:

[1]
+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+    struct dim *dim = container_of(work, struct dim, work);
+    struct receive_queue *rq = container_of(dim,
+            struct receive_queue, dim);
+    struct virtnet_info *vi = rq->vq->vdev->priv;
+    struct net_device *dev = vi->dev;
+    struct dim_cq_moder update_moder;
+    int i, qnum, err;
+
+    if (!rtnl_trylock())
+        return;
+
+    for (i = 0; i < vi->curr_queue_pairs; i++) {
+        rq = &vi->rq[i];
+        dim = &rq->dim;
+        qnum = rq - vi->rq;
+
+        if (!rq->dim_enabled)
+            continue;
+
+        update_moder = net_dim_get_rx_moderation(dim->mode, 
dim->profile_ix);
+        if (update_moder.usec != rq->intr_coal.max_usecs ||
+            update_moder.pkts != rq->intr_coal.max_packets) {
+            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+                                   update_moder.usec,
+                                   update_moder.pkts);
+            if (err)
+                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+                     dev->name, qnum);
+            dim->state = DIM_START_MEASURE;
+        }
+    }
+
+    rtnl_unlock();
+}


In addition, other optimizations[2] have been tried, but it may be due 
to the sparsely
scheduled work that the retry condition is always satisfied, affecting 
performance,
so [1] is the final solution:

[2]

+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+    struct dim *dim = container_of(work, struct dim, work);
+    struct receive_queue *rq = container_of(dim,
+            struct receive_queue, dim);
+    struct virtnet_info *vi = rq->vq->vdev->priv;
+    struct net_device *dev = vi->dev;
+    struct dim_cq_moder update_moder;
+    int i, qnum, err, count;
+
+    if (!rtnl_trylock())
+        return;
+retry:
+    count = vi->curr_queue_pairs;
+    for (i = 0; i < vi->curr_queue_pairs; i++) {
+        rq = &vi->rq[i];
+        dim = &rq->dim;
+        qnum = rq - vi->rq;
+        update_moder = net_dim_get_rx_moderation(dim->mode, 
dim->profile_ix);
+        if (update_moder.usec != rq->intr_coal.max_usecs ||
+            update_moder.pkts != rq->intr_coal.max_packets) {
+            --count;
+            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+                                   update_moder.usec,
+                                   update_moder.pkts);
+            if (err)
+                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+                     dev->name, qnum);
+            dim->state = DIM_START_MEASURE;
+        }
+    }
+
+    if (need_resched()) {
+        rtnl_unlock();
+        schedule();
+    }
+
+    if (count)
+        goto retry;
+
+    rtnl_unlock();
+}

Thanks a lot!

>
> Side note: for this specific case any functional test with a
> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> scenario above.
>
> Cheers,
>
> Paolo
Jason Wang Dec. 7, 2023, 4:19 a.m. UTC | #6
On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
> > On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
> >> 在 2023/12/5 下午4:35, Jason Wang 写道:
> >>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >>>> may occur due to cancel_work_sync for dim work.
> >>> Can you explain why?
> >> For example, during the bus unbind operation, the following call stack
> >> occurs:
> >> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> >> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
> >>
> >>>> Therefore, treating
> >>>> ctrl cmd as a separate protection object of the lock is the solution and
> >>>> the basis for the next patch.
> >>> Let's don't do that. Reasons are:
> >>>
> >>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
> >> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> >> atomic
> >> environment rules out the mutex lock.
> >>
> >>> 2) hold locks may complicate the future hardening works around cvq
> >> Agree, but I don't seem to have thought of a better way besides passing
> >> the lock.
> >> Do you have any other better ideas or suggestions?
> > What about:
> >
> > - using the rtnl lock only
> > - virtionet_close() invokes cancel_work(), without flushing the work
> > - virtnet_remove() calls flush_work() after unregister_netdev(),
> > outside the rtnl lock
> >
> > Should prevent both the deadlock and the UaF.
>
>
> Hi, Paolo and Jason!
>
> Thank you very much for your effective suggestions, but I found another
> solution[1],
> based on the ideas of rtnl_trylock and refill_work, which works very well:
>
> [1]
> +static void virtnet_rx_dim_work(struct work_struct *work)
> +{
> +    struct dim *dim = container_of(work, struct dim, work);
> +    struct receive_queue *rq = container_of(dim,
> +            struct receive_queue, dim);
> +    struct virtnet_info *vi = rq->vq->vdev->priv;
> +    struct net_device *dev = vi->dev;
> +    struct dim_cq_moder update_moder;
> +    int i, qnum, err;
> +
> +    if (!rtnl_trylock())
> +        return;

Don't we need to reschedule here?

like

if (rq->dim_enabled)
       sechedule_work()

?

Thanks

> +
> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
> +        rq = &vi->rq[i];
> +        dim = &rq->dim;
> +        qnum = rq - vi->rq;
> +
> +        if (!rq->dim_enabled)
> +            continue;
> +
> +        update_moder = net_dim_get_rx_moderation(dim->mode,
> dim->profile_ix);
> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
> +            update_moder.pkts != rq->intr_coal.max_packets) {
> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> +                                   update_moder.usec,
> +                                   update_moder.pkts);
> +            if (err)
> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> +                     dev->name, qnum);
> +            dim->state = DIM_START_MEASURE;
> +        }
> +    }
> +
> +    rtnl_unlock();
> +}
>
>
> In addition, other optimizations[2] have been tried, but it may be due
> to the sparsely
> scheduled work that the retry condition is always satisfied, affecting
> performance,
> so [1] is the final solution:
>
> [2]
>
> +static void virtnet_rx_dim_work(struct work_struct *work)
> +{
> +    struct dim *dim = container_of(work, struct dim, work);
> +    struct receive_queue *rq = container_of(dim,
> +            struct receive_queue, dim);
> +    struct virtnet_info *vi = rq->vq->vdev->priv;
> +    struct net_device *dev = vi->dev;
> +    struct dim_cq_moder update_moder;
> +    int i, qnum, err, count;
> +
> +    if (!rtnl_trylock())
> +        return;
> +retry:
> +    count = vi->curr_queue_pairs;
> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
> +        rq = &vi->rq[i];
> +        dim = &rq->dim;
> +        qnum = rq - vi->rq;
> +        update_moder = net_dim_get_rx_moderation(dim->mode,
> dim->profile_ix);
> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
> +            update_moder.pkts != rq->intr_coal.max_packets) {
> +            --count;
> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> +                                   update_moder.usec,
> +                                   update_moder.pkts);
> +            if (err)
> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> +                     dev->name, qnum);
> +            dim->state = DIM_START_MEASURE;
> +        }
> +    }
> +
> +    if (need_resched()) {
> +        rtnl_unlock();
> +        schedule();
> +    }
> +
> +    if (count)
> +        goto retry;
> +
> +    rtnl_unlock();
> +}
>
> Thanks a lot!
>
> >
> > Side note: for this specific case any functional test with a
> > CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> > scenario above.
> >
> > Cheers,
> >
> > Paolo
>
Heng Qi Dec. 7, 2023, 4:47 a.m. UTC | #7
在 2023/12/7 下午12:19, Jason Wang 写道:
> On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>
>>
>> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
>>> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
>>>> 在 2023/12/5 下午4:35, Jason Wang 写道:
>>>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>>>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>>>>>> may occur due to cancel_work_sync for dim work.
>>>>> Can you explain why?
>>>> For example, during the bus unbind operation, the following call stack
>>>> occurs:
>>>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
>>>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>>>>
>>>>>> Therefore, treating
>>>>>> ctrl cmd as a separate protection object of the lock is the solution and
>>>>>> the basis for the next patch.
>>>>> Let's don't do that. Reasons are:
>>>>>
>>>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>>>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
>>>> atomic
>>>> environment rules out the mutex lock.
>>>>
>>>>> 2) hold locks may complicate the future hardening works around cvq
>>>> Agree, but I don't seem to have thought of a better way besides passing
>>>> the lock.
>>>> Do you have any other better ideas or suggestions?
>>> What about:
>>>
>>> - using the rtnl lock only
>>> - virtionet_close() invokes cancel_work(), without flushing the work
>>> - virtnet_remove() calls flush_work() after unregister_netdev(),
>>> outside the rtnl lock
>>>
>>> Should prevent both the deadlock and the UaF.
>>
>> Hi, Paolo and Jason!
>>
>> Thank you very much for your effective suggestions, but I found another
>> solution[1],
>> based on the ideas of rtnl_trylock and refill_work, which works very well:
>>
>> [1]
>> +static void virtnet_rx_dim_work(struct work_struct *work)
>> +{
>> +    struct dim *dim = container_of(work, struct dim, work);
>> +    struct receive_queue *rq = container_of(dim,
>> +            struct receive_queue, dim);
>> +    struct virtnet_info *vi = rq->vq->vdev->priv;
>> +    struct net_device *dev = vi->dev;
>> +    struct dim_cq_moder update_moder;
>> +    int i, qnum, err;
>> +
>> +    if (!rtnl_trylock())
>> +        return;
> Don't we need to reschedule here?
>
> like
>
> if (rq->dim_enabled)
>         sechedule_work()
>
> ?

I think no, we don't need this.

The work of each queue will be called by "net_dim()->schedule_work()"
when napi traffic changes (before schedule_work(), the dim->profile_ix
of the corresponding rxq has been updated).
So we only need to traverse and update the profiles of all rxqs in the
work which is obtaining the rtnl_lock.

Thanks!

>
> Thanks
>
>> +
>> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
>> +        rq = &vi->rq[i];
>> +        dim = &rq->dim;
>> +        qnum = rq - vi->rq;
>> +
>> +        if (!rq->dim_enabled)
>> +            continue;
>> +
>> +        update_moder = net_dim_get_rx_moderation(dim->mode,
>> dim->profile_ix);
>> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
>> +            update_moder.pkts != rq->intr_coal.max_packets) {
>> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
>> +                                   update_moder.usec,
>> +                                   update_moder.pkts);
>> +            if (err)
>> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
>> +                     dev->name, qnum);
>> +            dim->state = DIM_START_MEASURE;
>> +        }
>> +    }
>> +
>> +    rtnl_unlock();
>> +}
>>
>>
>> In addition, other optimizations[2] have been tried, but it may be due
>> to the sparsely
>> scheduled work that the retry condition is always satisfied, affecting
>> performance,
>> so [1] is the final solution:
>>
>> [2]
>>
>> +static void virtnet_rx_dim_work(struct work_struct *work)
>> +{
>> +    struct dim *dim = container_of(work, struct dim, work);
>> +    struct receive_queue *rq = container_of(dim,
>> +            struct receive_queue, dim);
>> +    struct virtnet_info *vi = rq->vq->vdev->priv;
>> +    struct net_device *dev = vi->dev;
>> +    struct dim_cq_moder update_moder;
>> +    int i, qnum, err, count;
>> +
>> +    if (!rtnl_trylock())
>> +        return;
>> +retry:
>> +    count = vi->curr_queue_pairs;
>> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
>> +        rq = &vi->rq[i];
>> +        dim = &rq->dim;
>> +        qnum = rq - vi->rq;
>> +        update_moder = net_dim_get_rx_moderation(dim->mode,
>> dim->profile_ix);
>> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
>> +            update_moder.pkts != rq->intr_coal.max_packets) {
>> +            --count;
>> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
>> +                                   update_moder.usec,
>> +                                   update_moder.pkts);
>> +            if (err)
>> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
>> +                     dev->name, qnum);
>> +            dim->state = DIM_START_MEASURE;
>> +        }
>> +    }
>> +
>> +    if (need_resched()) {
>> +        rtnl_unlock();
>> +        schedule();
>> +    }
>> +
>> +    if (count)
>> +        goto retry;
>> +
>> +    rtnl_unlock();
>> +}
>>
>> Thanks a lot!
>>
>>> Side note: for this specific case any functional test with a
>>> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
>>> scenario above.
>>>
>>> Cheers,
>>>
>>> Paolo
Jason Wang Dec. 7, 2023, 5:31 a.m. UTC | #8
On Thu, Dec 7, 2023 at 12:47 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/7 下午12:19, Jason Wang 写道:
> > On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>
> >>
> >> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
> >>> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
> >>>> 在 2023/12/5 下午4:35, Jason Wang 写道:
> >>>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >>>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >>>>>> may occur due to cancel_work_sync for dim work.
> >>>>> Can you explain why?
> >>>> For example, during the bus unbind operation, the following call stack
> >>>> occurs:
> >>>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> >>>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
> >>>>
> >>>>>> Therefore, treating
> >>>>>> ctrl cmd as a separate protection object of the lock is the solution and
> >>>>>> the basis for the next patch.
> >>>>> Let's don't do that. Reasons are:
> >>>>>
> >>>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
> >>>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> >>>> atomic
> >>>> environment rules out the mutex lock.
> >>>>
> >>>>> 2) hold locks may complicate the future hardening works around cvq
> >>>> Agree, but I don't seem to have thought of a better way besides passing
> >>>> the lock.
> >>>> Do you have any other better ideas or suggestions?
> >>> What about:
> >>>
> >>> - using the rtnl lock only
> >>> - virtionet_close() invokes cancel_work(), without flushing the work
> >>> - virtnet_remove() calls flush_work() after unregister_netdev(),
> >>> outside the rtnl lock
> >>>
> >>> Should prevent both the deadlock and the UaF.
> >>
> >> Hi, Paolo and Jason!
> >>
> >> Thank you very much for your effective suggestions, but I found another
> >> solution[1],
> >> based on the ideas of rtnl_trylock and refill_work, which works very well:
> >>
> >> [1]
> >> +static void virtnet_rx_dim_work(struct work_struct *work)
> >> +{
> >> +    struct dim *dim = container_of(work, struct dim, work);
> >> +    struct receive_queue *rq = container_of(dim,
> >> +            struct receive_queue, dim);
> >> +    struct virtnet_info *vi = rq->vq->vdev->priv;
> >> +    struct net_device *dev = vi->dev;
> >> +    struct dim_cq_moder update_moder;
> >> +    int i, qnum, err;
> >> +
> >> +    if (!rtnl_trylock())
> >> +        return;
> > Don't we need to reschedule here?
> >
> > like
> >
> > if (rq->dim_enabled)
> >         sechedule_work()
> >
> > ?
>
> I think no, we don't need this.
>
> The work of each queue will be called by "net_dim()->schedule_work()"
> when napi traffic changes (before schedule_work(), the dim->profile_ix
> of the corresponding rxq has been updated).
> So we only need to traverse and update the profiles of all rxqs in the
> work which is obtaining the rtnl_lock.

Ok, let's have a comment here then.

Thanks

>
> Thanks!
>
> >
> > Thanks
> >
> >> +
> >> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
> >> +        rq = &vi->rq[i];
> >> +        dim = &rq->dim;
> >> +        qnum = rq - vi->rq;
> >> +
> >> +        if (!rq->dim_enabled)
> >> +            continue;
> >> +
> >> +        update_moder = net_dim_get_rx_moderation(dim->mode,
> >> dim->profile_ix);
> >> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
> >> +            update_moder.pkts != rq->intr_coal.max_packets) {
> >> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> >> +                                   update_moder.usec,
> >> +                                   update_moder.pkts);
> >> +            if (err)
> >> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> >> +                     dev->name, qnum);
> >> +            dim->state = DIM_START_MEASURE;
> >> +        }
> >> +    }
> >> +
> >> +    rtnl_unlock();
> >> +}
> >>
> >>
> >> In addition, other optimizations[2] have been tried, but it may be due
> >> to the sparsely
> >> scheduled work that the retry condition is always satisfied, affecting
> >> performance,
> >> so [1] is the final solution:
> >>
> >> [2]
> >>
> >> +static void virtnet_rx_dim_work(struct work_struct *work)
> >> +{
> >> +    struct dim *dim = container_of(work, struct dim, work);
> >> +    struct receive_queue *rq = container_of(dim,
> >> +            struct receive_queue, dim);
> >> +    struct virtnet_info *vi = rq->vq->vdev->priv;
> >> +    struct net_device *dev = vi->dev;
> >> +    struct dim_cq_moder update_moder;
> >> +    int i, qnum, err, count;
> >> +
> >> +    if (!rtnl_trylock())
> >> +        return;
> >> +retry:
> >> +    count = vi->curr_queue_pairs;
> >> +    for (i = 0; i < vi->curr_queue_pairs; i++) {
> >> +        rq = &vi->rq[i];
> >> +        dim = &rq->dim;
> >> +        qnum = rq - vi->rq;
> >> +        update_moder = net_dim_get_rx_moderation(dim->mode,
> >> dim->profile_ix);
> >> +        if (update_moder.usec != rq->intr_coal.max_usecs ||
> >> +            update_moder.pkts != rq->intr_coal.max_packets) {
> >> +            --count;
> >> +            err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> >> +                                   update_moder.usec,
> >> +                                   update_moder.pkts);
> >> +            if (err)
> >> +                pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> >> +                     dev->name, qnum);
> >> +            dim->state = DIM_START_MEASURE;
> >> +        }
> >> +    }
> >> +
> >> +    if (need_resched()) {
> >> +        rtnl_unlock();
> >> +        schedule();
> >> +    }
> >> +
> >> +    if (count)
> >> +        goto retry;
> >> +
> >> +    rtnl_unlock();
> >> +}
> >>
> >> Thanks a lot!
> >>
> >>> Side note: for this specific case any functional test with a
> >>> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> >>> scenario above.
> >>>
> >>> Cheers,
> >>>
> >>> Paolo
>
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 69fe09e99b3c..0dd09c4f8d89 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -301,6 +301,9 @@  struct virtnet_info {
 
 	struct control_buf *ctrl;
 
+	/* The lock to synchronize the access to contrl cmd */
+	spinlock_t ctrl_lock;
+
 	/* Ethtool settings */
 	u8 duplex;
 	u32 speed;
@@ -2520,13 +2523,16 @@  static int virtnet_set_mac_address(struct net_device *dev, void *p)
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
 		sg_init_one(&sg, addr->sa_data, dev->addr_len);
+		spin_lock(&vi->ctrl_lock);
 		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
 					  VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
 			dev_warn(&vdev->dev,
 				 "Failed to set mac address by vq command.\n");
 			ret = -EINVAL;
+			spin_unlock(&vi->ctrl_lock);
 			goto out;
 		}
+		spin_unlock(&vi->ctrl_lock);
 	} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
 		   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
 		unsigned int i;
@@ -2589,9 +2595,11 @@  static void virtnet_stats(struct net_device *dev,
 static void virtnet_ack_link_announce(struct virtnet_info *vi)
 {
 	rtnl_lock();
+	spin_lock(&vi->ctrl_lock);
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
 				  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL))
 		dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
+	spin_unlock(&vi->ctrl_lock);
 	rtnl_unlock();
 }
 
@@ -2603,6 +2611,7 @@  static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
 		return 0;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
 	sg_init_one(&sg, &vi->ctrl->mq, sizeof(vi->ctrl->mq));
 
@@ -2610,6 +2619,7 @@  static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 				  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg)) {
 		dev_warn(&dev->dev, "Fail to set num of queue pairs to %d\n",
 			 queue_pairs);
+		spin_unlock(&vi->ctrl_lock);
 		return -EINVAL;
 	} else {
 		vi->curr_queue_pairs = queue_pairs;
@@ -2618,6 +2628,7 @@  static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 			schedule_delayed_work(&vi->refill, 0);
 	}
 
+	spin_unlock(&vi->ctrl_lock);
 	return 0;
 }
 
@@ -2662,6 +2673,7 @@  static void virtnet_set_rx_mode(struct net_device *dev)
 	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
 		return;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
 	vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
 
@@ -2679,6 +2691,7 @@  static void virtnet_set_rx_mode(struct net_device *dev)
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
 			 vi->ctrl->allmulti ? "en" : "dis");
 
+	spin_unlock(&vi->ctrl_lock);
 	uc_count = netdev_uc_count(dev);
 	mc_count = netdev_mc_count(dev);
 	/* MAC filter - use one buffer for both lists */
@@ -2710,10 +2723,12 @@  static void virtnet_set_rx_mode(struct net_device *dev)
 	sg_set_buf(&sg[1], mac_data,
 		   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
 
+	spin_lock(&vi->ctrl_lock);
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
 				  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
 		dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
 
+	spin_unlock(&vi->ctrl_lock);
 	kfree(buf);
 }
 
@@ -2723,12 +2738,15 @@  static int virtnet_vlan_rx_add_vid(struct net_device *dev,
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct scatterlist sg;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
 	sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
 				  VIRTIO_NET_CTRL_VLAN_ADD, &sg))
 		dev_warn(&dev->dev, "Failed to add VLAN ID %d.\n", vid);
+	spin_unlock(&vi->ctrl_lock);
+
 	return 0;
 }
 
@@ -2738,12 +2756,15 @@  static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct scatterlist sg;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
 	sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
 				  VIRTIO_NET_CTRL_VLAN_DEL, &sg))
 		dev_warn(&dev->dev, "Failed to kill VLAN ID %d.\n", vid);
+	spin_unlock(&vi->ctrl_lock);
+
 	return 0;
 }
 
@@ -2958,11 +2979,15 @@  static int virtnet_set_ringparam(struct net_device *dev,
 			 * through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver
 			 * did not set any TX coalescing parameters, to 0.
 			 */
+			spin_lock(&vi->ctrl_lock);
 			err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
 							       vi->intr_coal_tx.max_usecs,
 							       vi->intr_coal_tx.max_packets);
-			if (err)
+			if (err) {
+				spin_unlock(&vi->ctrl_lock);
 				return err;
+			}
+			spin_unlock(&vi->ctrl_lock);
 		}
 
 		if (ring->rx_pending != rx_pending) {
@@ -2971,11 +2996,15 @@  static int virtnet_set_ringparam(struct net_device *dev,
 				return err;
 
 			/* The reason is same as the transmit virtqueue reset */
+			spin_lock(&vi->ctrl_lock);
 			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, i,
 							       vi->intr_coal_rx.max_usecs,
 							       vi->intr_coal_rx.max_packets);
-			if (err)
+			if (err) {
+				spin_unlock(&vi->ctrl_lock);
 				return err;
+			}
+			spin_unlock(&vi->ctrl_lock);
 		}
 	}
 
@@ -2991,6 +3020,7 @@  static bool virtnet_commit_rss_command(struct virtnet_info *vi)
 	/* prepare sgs */
 	sg_init_table(sgs, 4);
 
+	spin_lock(&vi->ctrl_lock);
 	sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table);
 	sg_set_buf(&sgs[0], &vi->ctrl->rss, sg_buf_size);
 
@@ -3008,8 +3038,12 @@  static bool virtnet_commit_rss_command(struct virtnet_info *vi)
 				  vi->has_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG
 				  : VIRTIO_NET_CTRL_MQ_HASH_CONFIG, sgs)) {
 		dev_warn(&dev->dev, "VIRTIONET issue with committing RSS sgs\n");
+		spin_unlock(&vi->ctrl_lock);
 		return false;
 	}
+
+	spin_unlock(&vi->ctrl_lock);
+
 	return true;
 }
 
@@ -3318,14 +3352,17 @@  static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
 	struct scatterlist sgs_tx;
 	int i;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
 	vi->ctrl->coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
 	sg_init_one(&sgs_tx, &vi->ctrl->coal_tx, sizeof(vi->ctrl->coal_tx));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
 				  VIRTIO_NET_CTRL_NOTF_COAL_TX_SET,
-				  &sgs_tx))
+				  &sgs_tx)) {
+		spin_unlock(&vi->ctrl_lock);
 		return -EINVAL;
+	}
 
 	/* Save parameters */
 	vi->intr_coal_tx.max_usecs = ec->tx_coalesce_usecs;
@@ -3334,6 +3371,7 @@  static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
 		vi->sq[i].intr_coal.max_usecs = ec->tx_coalesce_usecs;
 		vi->sq[i].intr_coal.max_packets = ec->tx_max_coalesced_frames;
 	}
+	spin_unlock(&vi->ctrl_lock);
 
 	return 0;
 }
@@ -3344,14 +3382,17 @@  static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
 	struct scatterlist sgs_rx;
 	int i;
 
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
 	vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
 	sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
 				  VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
-				  &sgs_rx))
+				  &sgs_rx)) {
+		spin_unlock(&vi->ctrl_lock);
 		return -EINVAL;
+	}
 
 	/* Save parameters */
 	vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
@@ -3361,6 +3402,8 @@  static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
 		vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
 	}
 
+	spin_unlock(&vi->ctrl_lock);
+
 	return 0;
 }
 
@@ -3386,17 +3429,24 @@  static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
 {
 	int err;
 
+	spin_lock(&vi->ctrl_lock);
 	err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
 					       ec->rx_coalesce_usecs,
 					       ec->rx_max_coalesced_frames);
-	if (err)
+	if (err) {
+		spin_unlock(&vi->ctrl_lock);
 		return err;
+	}
 
 	err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
 					       ec->tx_coalesce_usecs,
 					       ec->tx_max_coalesced_frames);
-	if (err)
+	if (err) {
+		spin_unlock(&vi->ctrl_lock);
 		return err;
+	}
+
+	spin_unlock(&vi->ctrl_lock);
 
 	return 0;
 }
@@ -3733,6 +3783,8 @@  static int virtnet_restore_up(struct virtio_device *vdev)
 static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
 {
 	struct scatterlist sg;
+
+	spin_lock(&vi->ctrl_lock);
 	vi->ctrl->offloads = cpu_to_virtio64(vi->vdev, offloads);
 
 	sg_init_one(&sg, &vi->ctrl->offloads, sizeof(vi->ctrl->offloads));
@@ -3740,9 +3792,11 @@  static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
 				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
 		dev_warn(&vi->dev->dev, "Fail to set guest offload.\n");
+		spin_unlock(&vi->ctrl_lock);
 		return -EINVAL;
 	}
 
+	spin_unlock(&vi->ctrl_lock);
 	return 0;
 }
 
@@ -4525,6 +4579,7 @@  static int virtnet_probe(struct virtio_device *vdev)
 
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
 	spin_lock_init(&vi->refill_lock);
+	spin_lock_init(&vi->ctrl_lock);
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
 		vi->mergeable_rx_bufs = true;
@@ -4669,13 +4724,16 @@  static int virtnet_probe(struct virtio_device *vdev)
 		struct scatterlist sg;
 
 		sg_init_one(&sg, dev->dev_addr, dev->addr_len);
+		spin_lock(&vi->ctrl_lock);
 		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
 					  VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
 			pr_debug("virtio_net: setting MAC address failed\n");
+			spin_unlock(&vi->ctrl_lock);
 			rtnl_unlock();
 			err = -EINVAL;
 			goto free_unregister_netdev;
 		}
+		spin_unlock(&vi->ctrl_lock);
 	}
 
 	rtnl_unlock();