diff mbox

[v4,0/8] vdpa: Send all CVQ state load commands in parallel

Message ID cover.1693287885.git.yin31149@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hawkins Jiawei Aug. 29, 2023, 5:54 a.m. UTC
This patchset allows QEMU to delay polling and checking the device
used buffer until either the SVQ is full or control commands shadow
buffers are full, instead of polling and checking immediately after
sending each SVQ control command, so that QEMU can send all the SVQ
control commands in parallel, which have better performance improvement.

I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
guest to build a test environment for sending multiple CVQ state load
commands. This patch series can improve latency from 20455 us to
13732 us for about 4099 CVQ state load commands, about 1.64 us per command.

Note that this patch should be based on
patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].

[1]. https://lore.kernel.org/all/cover.1690100802.git.yin31149@gmail.com/

TestStep
========
1. regression testing using vp-vdpa device
  - For L0 guest, boot QEMU with two virtio-net-pci net device with
`ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
      -device virtio-net-pci,disable-legacy=on,disable-modern=off,
iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...

  - For L1 guest, apply the patch series and compile the source code,
start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
`ctrl_rx`, `ctrl_rx_extra` features on, command line like:
      -netdev type=vhost-vdpa,x-svq=true,...
      -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
ctrl_rx=on,ctrl_rx_extra=on...

  - For L2 source guest, run the following bash command:
```bash
#!/bin/sh

for idx1 in {0..9}
do
  for idx2 in {0..9}
  do
    for idx3 in {0..6}
    do
      ip link add macvlan$idx1$idx2$idx3 link eth0
address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
      ip link set macvlan$idx1$idx2$idx3 up
    done
  done
done
```
  - Execute the live migration in L2 source monitor

  - Result
    * with this series, QEMU should not trigger any error or warning.



2. perf using vp-vdpa device
  - For L0 guest, boot QEMU with two virtio-net-pci net device with
`ctrl_vq`, `ctrl_vlan` features on, command line like:
      -device virtio-net-pci,disable-legacy=on,disable-modern=off,
iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
indirect_desc=off,queue_reset=off,ctrl_vlan=on,...

  - For L1 guest, apply the patch series, then apply an addtional
patch to record the load time in microseconds as following:
```diff
```

  - For L1 guest, compile the code, and start QEMU with two vdpa device
with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on,
command line like:
      -netdev type=vhost-vdpa,x-svq=true,...
      -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
ctrl_vlan=on...

  - For L2 source guest, run the following bash command:
```bash
#!/bin/sh

for idx in {1..4094}
do
  ip link add link eth0 name vlan$idx type vlan id $idx
done
```

  - execute the live migration in L2 source monitor

  - Result
    * with this series, QEMU should not trigger any warning
or error except something like "vhost_vdpa_net_load() = 13732 us"
    * without this series, QEMU should not trigger any warning
or error except something like "vhost_vdpa_net_load() = 20455 us"

ChangeLog
=========
v4:
  - refactor subject line suggested by Eugenio in patch
"vhost: Add count argument to vhost_svq_poll()"
  - split `in` to `vdpa_in` and `model_in` instead of reusing `in`
in vhost_vdpa_net_handle_ctrl_avail() suggested by Eugenio in patch
"vdpa: Use iovec for vhost_vdpa_net_cvq_add()"
  - pack CVQ command by iov_from_buf() instead of accessing
`out` directly suggested by Eugenio in patch
"vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()"
  - always check the return value of vhost_vdpa_net_svq_poll()
suggested Eugenio in patch
"vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()"
  - use `struct iovec` instead of `void **` as cursor,
add vhost_vdpa_net_load_cursor_reset() helper function
to reset the cursors, refactor vhost_vdpa_net_load_cmd() to prepare buffers
by iov_copy() instead of accessing `in` and `out` directly
suggested by Eugenio in patch
"vdpa: Introduce cursors to vhost_vdpa_net_loadx()"
  - refactor argument `cmds_in_flight` to `len` for
vhost_vdpa_net_svq_full(), check the return value of
vhost_vdpa_net_svq_poll() in vhost_vdpa_net_svq_flush(),
use iov_size(), vhost_vdpa_net_load_cursor_reset()
and iov_discard_front() to update the cursors instead of
accessing it directly according to Eugenio in patch
"vdpa: Send cvq state load commands in parallel"

v3: https://lore.kernel.org/all/cover.1689748694.git.yin31149@gmail.com/
  - refactor vhost_svq_poll() to accept cmds_in_flight
suggested by Jason and Eugenio
  - refactor vhost_vdpa_net_cvq_add() to make control commands buffers
is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse
it suggested by Eugenio
  - poll and check when SVQ is full or control commands shadow buffers is
full

v2: https://lore.kernel.org/all/cover.1683371965.git.yin31149@gmail.com/
  - recover accidentally deleted rows
  - remove extra newline
  - refactor `need_poll_len` to `cmds_in_flight`
  - return -EINVAL when vhost_svq_poll() return 0 or check
on buffers written by device fails
  - change the type of `in_cursor`, and refactor the
code for updating cursor
  - return directly when vhost_vdpa_net_load_{mac,mq}()
returns a failure in vhost_vdpa_net_load()

v1: https://lore.kernel.org/all/cover.1681732982.git.yin31149@gmail.com/

Hawkins Jiawei (8):
  vhost: Add count argument to vhost_svq_poll()
  vdpa: Use iovec for vhost_vdpa_net_cvq_add()
  vhost: Expose vhost_svq_available_slots()
  vdpa: Avoid using vhost_vdpa_net_load_*() outside
    vhost_vdpa_net_load()
  vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
  vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
  vdpa: Introduce cursors to vhost_vdpa_net_loadx()
  vdpa: Send cvq state load commands in parallel

 hw/virtio/vhost-shadow-virtqueue.c |  38 +--
 hw/virtio/vhost-shadow-virtqueue.h |   3 +-
 net/vhost-vdpa.c                   | 380 +++++++++++++++++++----------
 3 files changed, 276 insertions(+), 145 deletions(-)

Comments

Hawkins Jiawei Aug. 29, 2023, 9:32 a.m. UTC | #1
On 2023/8/29 13:54, Hawkins Jiawei wrote:
> This patchset allows QEMU to delay polling and checking the device
> used buffer until either the SVQ is full or control commands shadow
> buffers are full, instead of polling and checking immediately after
> sending each SVQ control command, so that QEMU can send all the SVQ
> control commands in parallel, which have better performance improvement.
>
> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> guest to build a test environment for sending multiple CVQ state load
> commands. This patch series can improve latency from 20455 us to
> 13732 us for about 4099 CVQ state load commands, about 1.64 us per command.
>
> Note that this patch should be based on
> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
>
> [1]. https://lore.kernel.org/all/cover.1690100802.git.yin31149@gmail.com/

Sorry for the outdated link. The correct link for this patch should
be https://lore.kernel.org/all/cover.1690106284.git.yin31149@gmail.com/

Thanks!


>
> TestStep
> ========
> 1. regression testing using vp-vdpa device
>    - For L0 guest, boot QEMU with two virtio-net-pci net device with
> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>        -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...
>
>    - For L1 guest, apply the patch series and compile the source code,
> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>        -netdev type=vhost-vdpa,x-svq=true,...
>        -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> ctrl_rx=on,ctrl_rx_extra=on...
>
>    - For L2 source guest, run the following bash command:
> ```bash
> #!/bin/sh
>
> for idx1 in {0..9}
> do
>    for idx2 in {0..9}
>    do
>      for idx3 in {0..6}
>      do
>        ip link add macvlan$idx1$idx2$idx3 link eth0
> address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
>        ip link set macvlan$idx1$idx2$idx3 up
>      done
>    done
> done
> ```
>    - Execute the live migration in L2 source monitor
>
>    - Result
>      * with this series, QEMU should not trigger any error or warning.
>
>
>
> 2. perf using vp-vdpa device
>    - For L0 guest, boot QEMU with two virtio-net-pci net device with
> `ctrl_vq`, `ctrl_vlan` features on, command line like:
>        -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
>
>    - For L1 guest, apply the patch series, then apply an addtional
> patch to record the load time in microseconds as following:
> ```diff
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 6b958d6363..501b510fd2 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -295,7 +295,10 @@ static int vhost_net_start_one(struct vhost_net *net,
>       }
>
>       if (net->nc->info->load) {
> +        int64_t start_us = g_get_monotonic_time();
>           r = net->nc->info->load(net->nc);
> +        error_report("vhost_vdpa_net_load() = %ld us",
> +                     g_get_monotonic_time() - start_us);
>           if (r < 0) {
>               goto fail;
>           }
> ```
>
>    - For L1 guest, compile the code, and start QEMU with two vdpa device
> with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on,
> command line like:
>        -netdev type=vhost-vdpa,x-svq=true,...
>        -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> ctrl_vlan=on...
>
>    - For L2 source guest, run the following bash command:
> ```bash
> #!/bin/sh
>
> for idx in {1..4094}
> do
>    ip link add link eth0 name vlan$idx type vlan id $idx
> done
> ```
>
>    - execute the live migration in L2 source monitor
>
>    - Result
>      * with this series, QEMU should not trigger any warning
> or error except something like "vhost_vdpa_net_load() = 13732 us"
>      * without this series, QEMU should not trigger any warning
> or error except something like "vhost_vdpa_net_load() = 20455 us"
>
> ChangeLog
> =========
> v4:
>    - refactor subject line suggested by Eugenio in patch
> "vhost: Add count argument to vhost_svq_poll()"
>    - split `in` to `vdpa_in` and `model_in` instead of reusing `in`
> in vhost_vdpa_net_handle_ctrl_avail() suggested by Eugenio in patch
> "vdpa: Use iovec for vhost_vdpa_net_cvq_add()"
>    - pack CVQ command by iov_from_buf() instead of accessing
> `out` directly suggested by Eugenio in patch
> "vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()"
>    - always check the return value of vhost_vdpa_net_svq_poll()
> suggested Eugenio in patch
> "vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()"
>    - use `struct iovec` instead of `void **` as cursor,
> add vhost_vdpa_net_load_cursor_reset() helper function
> to reset the cursors, refactor vhost_vdpa_net_load_cmd() to prepare buffers
> by iov_copy() instead of accessing `in` and `out` directly
> suggested by Eugenio in patch
> "vdpa: Introduce cursors to vhost_vdpa_net_loadx()"
>    - refactor argument `cmds_in_flight` to `len` for
> vhost_vdpa_net_svq_full(), check the return value of
> vhost_vdpa_net_svq_poll() in vhost_vdpa_net_svq_flush(),
> use iov_size(), vhost_vdpa_net_load_cursor_reset()
> and iov_discard_front() to update the cursors instead of
> accessing it directly according to Eugenio in patch
> "vdpa: Send cvq state load commands in parallel"
>
> v3: https://lore.kernel.org/all/cover.1689748694.git.yin31149@gmail.com/
>    - refactor vhost_svq_poll() to accept cmds_in_flight
> suggested by Jason and Eugenio
>    - refactor vhost_vdpa_net_cvq_add() to make control commands buffers
> is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse
> it suggested by Eugenio
>    - poll and check when SVQ is full or control commands shadow buffers is
> full
>
> v2: https://lore.kernel.org/all/cover.1683371965.git.yin31149@gmail.com/
>    - recover accidentally deleted rows
>    - remove extra newline
>    - refactor `need_poll_len` to `cmds_in_flight`
>    - return -EINVAL when vhost_svq_poll() return 0 or check
> on buffers written by device fails
>    - change the type of `in_cursor`, and refactor the
> code for updating cursor
>    - return directly when vhost_vdpa_net_load_{mac,mq}()
> returns a failure in vhost_vdpa_net_load()
>
> v1: https://lore.kernel.org/all/cover.1681732982.git.yin31149@gmail.com/
>
> Hawkins Jiawei (8):
>    vhost: Add count argument to vhost_svq_poll()
>    vdpa: Use iovec for vhost_vdpa_net_cvq_add()
>    vhost: Expose vhost_svq_available_slots()
>    vdpa: Avoid using vhost_vdpa_net_load_*() outside
>      vhost_vdpa_net_load()
>    vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
>    vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
>    vdpa: Introduce cursors to vhost_vdpa_net_loadx()
>    vdpa: Send cvq state load commands in parallel
>
>   hw/virtio/vhost-shadow-virtqueue.c |  38 +--
>   hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>   net/vhost-vdpa.c                   | 380 +++++++++++++++++++----------
>   3 files changed, 276 insertions(+), 145 deletions(-)
>
Michael S. Tsirkin Oct. 1, 2023, 7:56 p.m. UTC | #2
On Tue, Aug 29, 2023 at 01:54:42PM +0800, Hawkins Jiawei wrote:
> This patchset allows QEMU to delay polling and checking the device
> used buffer until either the SVQ is full or control commands shadow
> buffers are full, instead of polling and checking immediately after
> sending each SVQ control command, so that QEMU can send all the SVQ
> control commands in parallel, which have better performance improvement.
> 
> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> guest to build a test environment for sending multiple CVQ state load
> commands. This patch series can improve latency from 20455 us to
> 13732 us for about 4099 CVQ state load commands, about 1.64 us per command.
> 
> Note that this patch should be based on
> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
> 
> [1]. https://lore.kernel.org/all/cover.1690100802.git.yin31149@gmail.com/

Eugenio, you acked patch 1 but it's been a while - care to review the
rest of the patchset?

> TestStep
> ========
> 1. regression testing using vp-vdpa device
>   - For L0 guest, boot QEMU with two virtio-net-pci net device with
> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>       -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...
> 
>   - For L1 guest, apply the patch series and compile the source code,
> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>       -netdev type=vhost-vdpa,x-svq=true,...
>       -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> ctrl_rx=on,ctrl_rx_extra=on...
> 
>   - For L2 source guest, run the following bash command:
> ```bash
> #!/bin/sh
> 
> for idx1 in {0..9}
> do
>   for idx2 in {0..9}
>   do
>     for idx3 in {0..6}
>     do
>       ip link add macvlan$idx1$idx2$idx3 link eth0
> address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
>       ip link set macvlan$idx1$idx2$idx3 up
>     done
>   done
> done
> ```
>   - Execute the live migration in L2 source monitor
> 
>   - Result
>     * with this series, QEMU should not trigger any error or warning.
> 
> 
> 
> 2. perf using vp-vdpa device
>   - For L0 guest, boot QEMU with two virtio-net-pci net device with
> `ctrl_vq`, `ctrl_vlan` features on, command line like:
>       -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
> 
>   - For L1 guest, apply the patch series, then apply an addtional
> patch to record the load time in microseconds as following:
> ```diff
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 6b958d6363..501b510fd2 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -295,7 +295,10 @@ static int vhost_net_start_one(struct vhost_net *net,
>      }
>  
>      if (net->nc->info->load) {
> +        int64_t start_us = g_get_monotonic_time();
>          r = net->nc->info->load(net->nc);
> +        error_report("vhost_vdpa_net_load() = %ld us",
> +                     g_get_monotonic_time() - start_us);
>          if (r < 0) {
>              goto fail;
>          }
> ```
> 
>   - For L1 guest, compile the code, and start QEMU with two vdpa device
> with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on,
> command line like:
>       -netdev type=vhost-vdpa,x-svq=true,...
>       -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> ctrl_vlan=on...
> 
>   - For L2 source guest, run the following bash command:
> ```bash
> #!/bin/sh
> 
> for idx in {1..4094}
> do
>   ip link add link eth0 name vlan$idx type vlan id $idx
> done
> ```
> 
>   - execute the live migration in L2 source monitor
> 
>   - Result
>     * with this series, QEMU should not trigger any warning
> or error except something like "vhost_vdpa_net_load() = 13732 us"
>     * without this series, QEMU should not trigger any warning
> or error except something like "vhost_vdpa_net_load() = 20455 us"
> 
> ChangeLog
> =========
> v4:
>   - refactor subject line suggested by Eugenio in patch
> "vhost: Add count argument to vhost_svq_poll()"
>   - split `in` to `vdpa_in` and `model_in` instead of reusing `in`
> in vhost_vdpa_net_handle_ctrl_avail() suggested by Eugenio in patch
> "vdpa: Use iovec for vhost_vdpa_net_cvq_add()"
>   - pack CVQ command by iov_from_buf() instead of accessing
> `out` directly suggested by Eugenio in patch
> "vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()"
>   - always check the return value of vhost_vdpa_net_svq_poll()
> suggested Eugenio in patch
> "vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()"
>   - use `struct iovec` instead of `void **` as cursor,
> add vhost_vdpa_net_load_cursor_reset() helper function
> to reset the cursors, refactor vhost_vdpa_net_load_cmd() to prepare buffers
> by iov_copy() instead of accessing `in` and `out` directly
> suggested by Eugenio in patch
> "vdpa: Introduce cursors to vhost_vdpa_net_loadx()"
>   - refactor argument `cmds_in_flight` to `len` for
> vhost_vdpa_net_svq_full(), check the return value of
> vhost_vdpa_net_svq_poll() in vhost_vdpa_net_svq_flush(),
> use iov_size(), vhost_vdpa_net_load_cursor_reset()
> and iov_discard_front() to update the cursors instead of
> accessing it directly according to Eugenio in patch
> "vdpa: Send cvq state load commands in parallel"
> 
> v3: https://lore.kernel.org/all/cover.1689748694.git.yin31149@gmail.com/
>   - refactor vhost_svq_poll() to accept cmds_in_flight
> suggested by Jason and Eugenio
>   - refactor vhost_vdpa_net_cvq_add() to make control commands buffers
> is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse
> it suggested by Eugenio
>   - poll and check when SVQ is full or control commands shadow buffers is
> full
> 
> v2: https://lore.kernel.org/all/cover.1683371965.git.yin31149@gmail.com/
>   - recover accidentally deleted rows
>   - remove extra newline
>   - refactor `need_poll_len` to `cmds_in_flight`
>   - return -EINVAL when vhost_svq_poll() return 0 or check
> on buffers written by device fails
>   - change the type of `in_cursor`, and refactor the
> code for updating cursor
>   - return directly when vhost_vdpa_net_load_{mac,mq}()
> returns a failure in vhost_vdpa_net_load()
> 
> v1: https://lore.kernel.org/all/cover.1681732982.git.yin31149@gmail.com/
> 
> Hawkins Jiawei (8):
>   vhost: Add count argument to vhost_svq_poll()
>   vdpa: Use iovec for vhost_vdpa_net_cvq_add()
>   vhost: Expose vhost_svq_available_slots()
>   vdpa: Avoid using vhost_vdpa_net_load_*() outside
>     vhost_vdpa_net_load()
>   vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
>   vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
>   vdpa: Introduce cursors to vhost_vdpa_net_loadx()
>   vdpa: Send cvq state load commands in parallel
> 
>  hw/virtio/vhost-shadow-virtqueue.c |  38 +--
>  hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>  net/vhost-vdpa.c                   | 380 +++++++++++++++++++----------
>  3 files changed, 276 insertions(+), 145 deletions(-)
> 
> -- 
> 2.25.1
Eugenio Perez Martin Oct. 3, 2023, 6:21 p.m. UTC | #3
On Sun, Oct 1, 2023 at 9:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 29, 2023 at 01:54:42PM +0800, Hawkins Jiawei wrote:
> > This patchset allows QEMU to delay polling and checking the device
> > used buffer until either the SVQ is full or control commands shadow
> > buffers are full, instead of polling and checking immediately after
> > sending each SVQ control command, so that QEMU can send all the SVQ
> > control commands in parallel, which have better performance improvement.
> >
> > I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> > guest to build a test environment for sending multiple CVQ state load
> > commands. This patch series can improve latency from 20455 us to
> > 13732 us for about 4099 CVQ state load commands, about 1.64 us per command.
> >
> > Note that this patch should be based on
> > patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
> >
> > [1]. https://lore.kernel.org/all/cover.1690100802.git.yin31149@gmail.com/
>
> Eugenio, you acked patch 1 but it's been a while - care to review the
> rest of the patchset?
>

I'm sorry, I was under the impression that this should go after
optimizing the memory maps.

I'll continue with the revision.

Thanks!

> > TestStep
> > ========
> > 1. regression testing using vp-vdpa device
> >   - For L0 guest, boot QEMU with two virtio-net-pci net device with
> > `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
> >       -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> > iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> > indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...
> >
> >   - For L1 guest, apply the patch series and compile the source code,
> > start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> > `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
> >       -netdev type=vhost-vdpa,x-svq=true,...
> >       -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> > ctrl_rx=on,ctrl_rx_extra=on...
> >
> >   - For L2 source guest, run the following bash command:
> > ```bash
> > #!/bin/sh
> >
> > for idx1 in {0..9}
> > do
> >   for idx2 in {0..9}
> >   do
> >     for idx3 in {0..6}
> >     do
> >       ip link add macvlan$idx1$idx2$idx3 link eth0
> > address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
> >       ip link set macvlan$idx1$idx2$idx3 up
> >     done
> >   done
> > done
> > ```
> >   - Execute the live migration in L2 source monitor
> >
> >   - Result
> >     * with this series, QEMU should not trigger any error or warning.
> >
> >
> >
> > 2. perf using vp-vdpa device
> >   - For L0 guest, boot QEMU with two virtio-net-pci net device with
> > `ctrl_vq`, `ctrl_vlan` features on, command line like:
> >       -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> > iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> > indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
> >
> >   - For L1 guest, apply the patch series, then apply an addtional
> > patch to record the load time in microseconds as following:
> > ```diff
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 6b958d6363..501b510fd2 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -295,7 +295,10 @@ static int vhost_net_start_one(struct vhost_net *net,
> >      }
> >
> >      if (net->nc->info->load) {
> > +        int64_t start_us = g_get_monotonic_time();
> >          r = net->nc->info->load(net->nc);
> > +        error_report("vhost_vdpa_net_load() = %ld us",
> > +                     g_get_monotonic_time() - start_us);
> >          if (r < 0) {
> >              goto fail;
> >          }
> > ```
> >
> >   - For L1 guest, compile the code, and start QEMU with two vdpa device
> > with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on,
> > command line like:
> >       -netdev type=vhost-vdpa,x-svq=true,...
> >       -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> > ctrl_vlan=on...
> >
> >   - For L2 source guest, run the following bash command:
> > ```bash
> > #!/bin/sh
> >
> > for idx in {1..4094}
> > do
> >   ip link add link eth0 name vlan$idx type vlan id $idx
> > done
> > ```
> >
> >   - execute the live migration in L2 source monitor
> >
> >   - Result
> >     * with this series, QEMU should not trigger any warning
> > or error except something like "vhost_vdpa_net_load() = 13732 us"
> >     * without this series, QEMU should not trigger any warning
> > or error except something like "vhost_vdpa_net_load() = 20455 us"
> >
> > ChangeLog
> > =========
> > v4:
> >   - refactor subject line suggested by Eugenio in patch
> > "vhost: Add count argument to vhost_svq_poll()"
> >   - split `in` to `vdpa_in` and `model_in` instead of reusing `in`
> > in vhost_vdpa_net_handle_ctrl_avail() suggested by Eugenio in patch
> > "vdpa: Use iovec for vhost_vdpa_net_cvq_add()"
> >   - pack CVQ command by iov_from_buf() instead of accessing
> > `out` directly suggested by Eugenio in patch
> > "vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()"
> >   - always check the return value of vhost_vdpa_net_svq_poll()
> > suggested Eugenio in patch
> > "vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()"
> >   - use `struct iovec` instead of `void **` as cursor,
> > add vhost_vdpa_net_load_cursor_reset() helper function
> > to reset the cursors, refactor vhost_vdpa_net_load_cmd() to prepare buffers
> > by iov_copy() instead of accessing `in` and `out` directly
> > suggested by Eugenio in patch
> > "vdpa: Introduce cursors to vhost_vdpa_net_loadx()"
> >   - refactor argument `cmds_in_flight` to `len` for
> > vhost_vdpa_net_svq_full(), check the return value of
> > vhost_vdpa_net_svq_poll() in vhost_vdpa_net_svq_flush(),
> > use iov_size(), vhost_vdpa_net_load_cursor_reset()
> > and iov_discard_front() to update the cursors instead of
> > accessing it directly according to Eugenio in patch
> > "vdpa: Send cvq state load commands in parallel"
> >
> > v3: https://lore.kernel.org/all/cover.1689748694.git.yin31149@gmail.com/
> >   - refactor vhost_svq_poll() to accept cmds_in_flight
> > suggested by Jason and Eugenio
> >   - refactor vhost_vdpa_net_cvq_add() to make control commands buffers
> > is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse
> > it suggested by Eugenio
> >   - poll and check when SVQ is full or control commands shadow buffers is
> > full
> >
> > v2: https://lore.kernel.org/all/cover.1683371965.git.yin31149@gmail.com/
> >   - recover accidentally deleted rows
> >   - remove extra newline
> >   - refactor `need_poll_len` to `cmds_in_flight`
> >   - return -EINVAL when vhost_svq_poll() return 0 or check
> > on buffers written by device fails
> >   - change the type of `in_cursor`, and refactor the
> > code for updating cursor
> >   - return directly when vhost_vdpa_net_load_{mac,mq}()
> > returns a failure in vhost_vdpa_net_load()
> >
> > v1: https://lore.kernel.org/all/cover.1681732982.git.yin31149@gmail.com/
> >
> > Hawkins Jiawei (8):
> >   vhost: Add count argument to vhost_svq_poll()
> >   vdpa: Use iovec for vhost_vdpa_net_cvq_add()
> >   vhost: Expose vhost_svq_available_slots()
> >   vdpa: Avoid using vhost_vdpa_net_load_*() outside
> >     vhost_vdpa_net_load()
> >   vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
> >   vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
> >   vdpa: Introduce cursors to vhost_vdpa_net_loadx()
> >   vdpa: Send cvq state load commands in parallel
> >
> >  hw/virtio/vhost-shadow-virtqueue.c |  38 +--
> >  hw/virtio/vhost-shadow-virtqueue.h |   3 +-
> >  net/vhost-vdpa.c                   | 380 +++++++++++++++++++----------
> >  3 files changed, 276 insertions(+), 145 deletions(-)
> >
> > --
> > 2.25.1
>
diff mbox

Patch

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b958d6363..501b510fd2 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -295,7 +295,10 @@  static int vhost_net_start_one(struct vhost_net *net,
     }
 
     if (net->nc->info->load) {
+        int64_t start_us = g_get_monotonic_time();
         r = net->nc->info->load(net->nc);
+        error_report("vhost_vdpa_net_load() = %ld us",
+                     g_get_monotonic_time() - start_us);
         if (r < 0) {
             goto fail;
         }