Message ID | 1711021557-58116-2-git-send-email-hengqi@linux.alibaba.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio-net: a fix and some updates for virtio dim | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > When the dim worker is scheduled, if it fails to acquire the lock, > dim may not be able to return to the working state later. > > For example, the following single queue scenario: > 1. The dim worker of rxq0 is scheduled, and the dim status is > changed to DIM_APPLY_NEW_PROFILE; > 2. The ethtool command is holding rtnl lock; > 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > to acquire the lock and exits; > > Then, even if net_dim is invoked again, it cannot work because the > state is not restored to DIM_START_MEASURE. > > Fixes: 6208799553a8 ("virtio-net: support rx netdim") > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c22d111..0ebe322 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > struct dim_cq_moder update_moder; > int i, qnum, err; > > - if (!rtnl_trylock()) > + if (!rtnl_trylock()) { > + schedule_work(&dim->work); > return; > + } Patch looks fine but I wonder if a delayed schedule is better. Thanks > > /* Each rxq's work is queued by "net_dim()->schedule_work()" > * in response to NAPI traffic changes. Note that dim->profile_ix > -- > 1.8.3.1 >
在 2024/3/22 下午1:17, Jason Wang 写道: > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> When the dim worker is scheduled, if it fails to acquire the lock, >> dim may not be able to return to the working state later. >> >> For example, the following single queue scenario: >> 1. The dim worker of rxq0 is scheduled, and the dim status is >> changed to DIM_APPLY_NEW_PROFILE; >> 2. The ethtool command is holding rtnl lock; >> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails >> to acquire the lock and exits; >> >> Then, even if net_dim is invoked again, it cannot work because the >> state is not restored to DIM_START_MEASURE. >> >> Fixes: 6208799553a8 ("virtio-net: support rx netdim") >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >> --- >> drivers/net/virtio_net.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index c22d111..0ebe322 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) >> struct dim_cq_moder update_moder; >> int i, qnum, err; >> >> - if (!rtnl_trylock()) >> + if (!rtnl_trylock()) { >> + schedule_work(&dim->work); >> return; >> + } > Patch looks fine but I wonder if a delayed schedule is better. The work in net_dim() core layer uses non-delayed-work, and the two cannot be mixed. Thanks, Heng > > Thanks > >> /* Each rxq's work is queued by "net_dim()->schedule_work()" >> * in response to NAPI traffic changes. Note that dim->profile_ix >> -- >> 1.8.3.1 >>
On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/22 下午1:17, Jason Wang 写道: > > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> When the dim worker is scheduled, if it fails to acquire the lock, > >> dim may not be able to return to the working state later. > >> > >> For example, the following single queue scenario: > >> 1. The dim worker of rxq0 is scheduled, and the dim status is > >> changed to DIM_APPLY_NEW_PROFILE; > >> 2. The ethtool command is holding rtnl lock; > >> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > >> to acquire the lock and exits; > >> > >> Then, even if net_dim is invoked again, it cannot work because the > >> state is not restored to DIM_START_MEASURE. > >> > >> Fixes: 6208799553a8 ("virtio-net: support rx netdim") > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >> --- > >> drivers/net/virtio_net.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >> index c22d111..0ebe322 100644 > >> --- a/drivers/net/virtio_net.c > >> +++ b/drivers/net/virtio_net.c > >> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > >> struct dim_cq_moder update_moder; > >> int i, qnum, err; > >> > >> - if (!rtnl_trylock()) > >> + if (!rtnl_trylock()) { > >> + schedule_work(&dim->work); > >> return; > >> + } > > Patch looks fine but I wonder if a delayed schedule is better. > > The work in net_dim() core layer uses non-delayed-work, and the two > cannot be mixed. Well, I think we need first to figure out if delayed work is better here. Switching to use delayed work for dim seems not hard anyhow. Thanks > > Thanks, > Heng > > > > > Thanks > > > >> /* Each rxq's work is queued by "net_dim()->schedule_work()" > >> * in response to NAPI traffic changes. Note that dim->profile_ix > >> -- > >> 1.8.3.1 > >> >
在 2024/3/25 下午2:29, Jason Wang 写道: > On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/22 下午1:17, Jason Wang 写道: >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> When the dim worker is scheduled, if it fails to acquire the lock, >>>> dim may not be able to return to the working state later. >>>> >>>> For example, the following single queue scenario: >>>> 1. The dim worker of rxq0 is scheduled, and the dim status is >>>> changed to DIM_APPLY_NEW_PROFILE; >>>> 2. The ethtool command is holding rtnl lock; >>>> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails >>>> to acquire the lock and exits; >>>> >>>> Then, even if net_dim is invoked again, it cannot work because the >>>> state is not restored to DIM_START_MEASURE. >>>> >>>> Fixes: 6208799553a8 ("virtio-net: support rx netdim") >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>> --- >>>> drivers/net/virtio_net.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>> index c22d111..0ebe322 100644 >>>> --- a/drivers/net/virtio_net.c >>>> +++ b/drivers/net/virtio_net.c >>>> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) >>>> struct dim_cq_moder update_moder; >>>> int i, qnum, err; >>>> >>>> - if (!rtnl_trylock()) >>>> + if (!rtnl_trylock()) { >>>> + schedule_work(&dim->work); >>>> return; >>>> + } >>> Patch looks fine but I wonder if a delayed schedule is better. >> The work in net_dim() core layer uses non-delayed-work, and the two >> cannot be mixed. > Well, I think we need first to figure out if delayed work is better here. I tested a VM with 16 NICs, 128 queues per NIC (2kq total). With dim enabled on all queues, there are many opportunities for contention for rtnl lock, and this patch introduces no visible hotspots. The dim performance is also stable. So I think there doesn't seem to be a strong motivation right now. Thanks, Heng > > Switching to use delayed work for dim seems not hard anyhow. > > Thanks > >> Thanks, >> Heng >> >>> Thanks >>> >>>> /* Each rxq's work is queued by "net_dim()->schedule_work()" >>>> * in response to NAPI traffic changes. Note that dim->profile_ix >>>> -- >>>> 1.8.3.1 >>>>
On Mon, Mar 25, 2024 at 2:58 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/25 下午2:29, Jason Wang 写道: > > On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/22 下午1:17, Jason Wang 写道: > >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> When the dim worker is scheduled, if it fails to acquire the lock, > >>>> dim may not be able to return to the working state later. > >>>> > >>>> For example, the following single queue scenario: > >>>> 1. The dim worker of rxq0 is scheduled, and the dim status is > >>>> changed to DIM_APPLY_NEW_PROFILE; > >>>> 2. The ethtool command is holding rtnl lock; > >>>> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > >>>> to acquire the lock and exits; > >>>> > >>>> Then, even if net_dim is invoked again, it cannot work because the > >>>> state is not restored to DIM_START_MEASURE. > >>>> > >>>> Fixes: 6208799553a8 ("virtio-net: support rx netdim") > >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>>> --- > >>>> drivers/net/virtio_net.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>> index c22d111..0ebe322 100644 > >>>> --- a/drivers/net/virtio_net.c > >>>> +++ b/drivers/net/virtio_net.c > >>>> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > >>>> struct dim_cq_moder update_moder; > >>>> int i, qnum, err; > >>>> > >>>> - if (!rtnl_trylock()) > >>>> + if (!rtnl_trylock()) { > >>>> + schedule_work(&dim->work); > >>>> return; > >>>> + } > >>> Patch looks fine but I wonder if a delayed schedule is better. > >> The work in net_dim() core layer uses non-delayed-work, and the two > >> cannot be mixed. > > Well, I think we need first to figure out if delayed work is better here. > > I tested a VM with 16 NICs, 128 queues per NIC (2kq total). With dim > enabled on all queues, > there are many opportunities for contention for rtnl lock, and this > patch introduces no visible hotspots. > The dim performance is also stable. So I think there doesn't seem to be > a strong motivation right now. That's fine, let's add them to the changelog. Acked-by: Jason Wang <jasowang@redhat.com> Thanks > > Thanks, > Heng > > > > > Switching to use delayed work for dim seems not hard anyhow. > > > > Thanks > > > >> Thanks, > >> Heng > >> > >>> Thanks > >>> > >>>> /* Each rxq's work is queued by "net_dim()->schedule_work()" > >>>> * in response to NAPI traffic changes. Note that dim->profile_ix > >>>> -- > >>>> 1.8.3.1 > >>>> >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c22d111..0ebe322 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) struct dim_cq_moder update_moder; int i, qnum, err; - if (!rtnl_trylock()) + if (!rtnl_trylock()) { + schedule_work(&dim->work); return; + } /* Each rxq's work is queued by "net_dim()->schedule_work()" * in response to NAPI traffic changes. Note that dim->profile_ix
When the dim worker is scheduled, if it fails to acquire the lock, dim may not be able to return to the working state later. For example, the following single queue scenario: 1. The dim worker of rxq0 is scheduled, and the dim status is changed to DIM_APPLY_NEW_PROFILE; 2. The ethtool command is holding rtnl lock; 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails to acquire the lock and exits; Then, even if net_dim is invoked again, it cannot work because the state is not restored to DIM_START_MEASURE. Fixes: 6208799553a8 ("virtio-net: support rx netdim") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)