diff mbox series

[v3,6/7] vdpa: Avoid forwarding large CVQ command failures

Message ID 267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com (mailing list archive)
State New, archived
Headers show
Series Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support | expand

Commit Message

Hawkins Jiawei July 7, 2023, 3:27 p.m. UTC
Due to the size limitation of the out buffer sent to the vdpa device,
which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
command is truncated in QEMU. As a result, the vdpa device rejects
this flawd CVQ command.

However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
CVQ command has a variable length, which may exceed
vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
`MAC_TABLE_ENTRIES` MAC addresses for the filter table.

This patch solves this problem by following steps:

  * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
which represents the size of the buffer that is allocated and mmaped.
This ensures that everything works correctly as long as the guest
sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
sizeof(struct virtio_net_ctrl_hdr)
- 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
    Considering the highly unlikely scenario for the guest setting
more than that number of MAC addresses for the filter table, this
should work fine for the majority of cases.

  * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
instead of directly sending this CVQ command, QEMU should send
a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
(`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
(`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
to the device model.
    By doing so, the vdpa device turns promiscuous mode on, aligning
with the VirtIO standard. The device model marks
`n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
which aligns with the state of the vdpa device.

Note that the bug cannot be triggered at the moment, since
VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.

Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
 net/vhost-vdpa.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 161 insertions(+), 1 deletion(-)

Comments

Eugenio Perez Martin Aug. 17, 2023, 11:08 a.m. UTC | #1
On Fri, Jul 7, 2023 at 5:28 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
>
> Due to the size limitation of the out buffer sent to the vdpa device,
> which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
> command is truncated in QEMU. As a result, the vdpa device rejects
> this flawd CVQ command.
>
> However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
> CVQ command has a variable length, which may exceed
> vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
> `MAC_TABLE_ENTRIES` MAC addresses for the filter table.
>
> This patch solves this problem by following steps:
>
>   * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
> which represents the size of the buffer that is allocated and mmaped.
> This ensures that everything works correctly as long as the guest
> sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
> sizeof(struct virtio_net_ctrl_hdr)
> - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
>     Considering the highly unlikely scenario for the guest setting
> more than that number of MAC addresses for the filter table, this
> should work fine for the majority of cases.
>
>   * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
> instead of directly sending this CVQ command, QEMU should send
> a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
> a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
> (`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
> (`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
> to the device model.
>     By doing so, the vdpa device turns promiscuous mode on, aligning
> with the VirtIO standard. The device model marks
> `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
> which aligns with the state of the vdpa device.
>
> Note that the bug cannot be triggered at the moment, since
> VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.
>
> Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  net/vhost-vdpa.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 161 insertions(+), 1 deletion(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a84eb088a0..a4ff6c52b7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -916,6 +916,148 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
>      .check_peer_type = vhost_vdpa_check_peer_type,
>  };
>
> +/*
> + * Forward the excessive VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command to
> + * vdpa device.
> + *
> + * Considering that QEMU cannot send the entire filter table to the
> + * vdpa device, it should send the VIRTIO_NET_CTRL_RX_PROMISC CVQ
> + * command to enable promiscuous mode to receive all packets,
> + * according to VirtIO standard, "Since there are no guarantees,
> + * it can use a hash filter or silently switch to allmulti or
> + * promiscuous mode if it is given too many addresses.".
> + *
> + * Since QEMU ignores MAC addresses beyond `MAC_TABLE_ENTRIES` and
> + * marks `n->mac_table.x_overflow` accordingly, it should have
> + * the same effect on the device model to receive
> + * (`MAC_TABLE_ENTRIES` + 1) or more non-multicast MAC addresses.
> + * The same applies to multicast MAC addresses.
> + *
> + * Therefore, QEMU can provide the device model with a fake
> + * VIRTIO_NET_CTRL_MAC_TABLE_SET command with (`MAC_TABLE_ENTRIES` + 1)
> + * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast
> + * MAC addresses. This ensures that the device model marks
> + * `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
> + * allowing all packets to be received, which aligns with the
> + * state of the vdpa device.
> + */
> +static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s,
> +                                                       VirtQueueElement *elem,
> +                                                       struct iovec *out)
> +{
> +    struct virtio_net_ctrl_mac mac_data, *mac_ptr;
> +    struct virtio_net_ctrl_hdr *hdr_ptr;
> +    uint32_t cursor;
> +    ssize_t r;
> +
> +    /* parse the non-multicast MAC address entries from CVQ command */
> +    cursor = sizeof(*hdr_ptr);
> +    r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
> +                   &mac_data, sizeof(mac_data));
> +    if (unlikely(r != sizeof(mac_data))) {
> +        /*
> +         * If the CVQ command is invalid, we should simulate the vdpa device
> +         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> +         */
> +        *s->status = VIRTIO_NET_ERR;
> +        return sizeof(*s->status);
> +    }
> +    cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
> +
> +    /* parse the multicast MAC address entries from CVQ command */
> +    r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
> +                   &mac_data, sizeof(mac_data));
> +    if (r != sizeof(mac_data)) {
> +        /*
> +         * If the CVQ command is invalid, we should simulate the vdpa device
> +         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> +         */
> +        *s->status = VIRTIO_NET_ERR;
> +        return sizeof(*s->status);
> +    }
> +    cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
> +
> +    /* validate the CVQ command */
> +    if (iov_size(elem->out_sg, elem->out_num) != cursor) {
> +        /*
> +         * If the CVQ command is invalid, we should simulate the vdpa device
> +         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> +         */
> +        *s->status = VIRTIO_NET_ERR;
> +        return sizeof(*s->status);
> +    }
> +
> +    /*
> +     * According to VirtIO standard, "Since there are no guarantees,
> +     * it can use a hash filter or silently switch to allmulti or
> +     * promiscuous mode if it is given too many addresses.".
> +     *
> +     * Therefore, considering that QEMU is unable to send the entire
> +     * filter table to the vdpa device, it should send the
> +     * VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode
> +     */
> +    r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1);
> +    if (unlikely(r < 0)) {
> +        return r;
> +    }
> +    if (*s->status != VIRTIO_NET_OK) {
> +        return sizeof(*s->status);
> +    }
> +
> +    /*
> +     * QEMU should also send a fake VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ
> +     * command to the device model, including (`MAC_TABLE_ENTRIES` + 1)
> +     * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1)
> +     * multicast MAC addresses.
> +     *
> +     * By doing so, the device model can mark `n->mac_table.uni_overflow`
> +     * and `n->mac_table.multi_overflow`, enabling all packets to be
> +     * received, which aligns with the state of the vdpa device.
> +     */
> +    cursor = 0;
> +    uint32_t fake_uni_entries = MAC_TABLE_ENTRIES + 1,
> +             fake_mul_entries = MAC_TABLE_ENTRIES + 1,
> +             fake_cvq_size = sizeof(struct virtio_net_ctrl_hdr) +
> +                             sizeof(mac_data) + fake_uni_entries * ETH_ALEN +
> +                             sizeof(mac_data) + fake_mul_entries * ETH_ALEN;
> +
> +    assert(fake_cvq_size < vhost_vdpa_net_cvq_cmd_page_len());
> +    out->iov_len = fake_cvq_size;
> +
> +    /* pack the header for fake CVQ command */
> +    hdr_ptr = out->iov_base + cursor;
> +    hdr_ptr->class = VIRTIO_NET_CTRL_MAC;
> +    hdr_ptr->cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
> +    cursor += sizeof(*hdr_ptr);
> +
> +    /*
> +     * Pack the non-multicast MAC addresses part for fake CVQ command.
> +     *
> +     * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
> +     * addresses provieded in CVQ command. Therefore, only the entries
> +     * field need to be prepared in the CVQ command.
> +     */
> +    mac_ptr = out->iov_base + cursor;
> +    mac_ptr->entries = cpu_to_le32(fake_uni_entries);
> +    cursor += sizeof(*mac_ptr) + fake_uni_entries * ETH_ALEN;
> +
> +    /*
> +     * Pack the multicast MAC addresses part for fake CVQ command.
> +     *
> +     * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
> +     * addresses provieded in CVQ command. Therefore, only the entries
> +     * field need to be prepared in the CVQ command.
> +     */
> +    mac_ptr = out->iov_base + cursor;
> +    mac_ptr->entries = cpu_to_le32(fake_mul_entries);
> +
> +    /*
> +     * Simulating QEMU poll a vdpa device used buffer
> +     * for VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
> +     */
> +    return sizeof(*s->status);
> +}
> +
>  /**
>   * Validate and copy control virtqueue commands.
>   *
> @@ -943,7 +1085,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
>
>      out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
>                               s->cvq_cmd_out_buffer,
> -                             vhost_vdpa_net_cvq_cmd_len());
> +                             vhost_vdpa_net_cvq_cmd_page_len());
>
>      ctrl = s->cvq_cmd_out_buffer;
>      if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) {
> @@ -953,6 +1095,24 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
>           */
>          dev_written = sizeof(status);
>          *s->status = VIRTIO_NET_OK;
> +    } else if (unlikely(ctrl->class == VIRTIO_NET_CTRL_MAC &&
> +                        ctrl->cmd == VIRTIO_NET_CTRL_MAC_TABLE_SET &&
> +                        iov_size(elem->out_sg, elem->out_num) > out.iov_len)) {
> +        /*
> +         * Due to the size limitation of the out buffer sent to the vdpa device,
> +         * which is determined by vhost_vdpa_net_cvq_cmd_page_len(), excessive
> +         * MAC addresses set by the driver for the filter table can cause
> +         * truncation of the CVQ command in QEMU. As a result, the vdpa device
> +         * rejects the flawed CVQ command.
> +         *
> +         * Therefore, QEMU must handle this situation instead of sending
> +         * the CVQ command direclty.
> +         */
> +        dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem,
> +                                                                  &out);
> +        if (unlikely(dev_written < 0)) {
> +            goto out;
> +        }
>      } else {
>          dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
>          if (unlikely(dev_written < 0)) {
> --
> 2.25.1
>
diff mbox series

Patch

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a84eb088a0..a4ff6c52b7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -916,6 +916,148 @@  static NetClientInfo net_vhost_vdpa_cvq_info = {
     .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+/*
+ * Forward the excessive VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command to
+ * vdpa device.
+ *
+ * Considering that QEMU cannot send the entire filter table to the
+ * vdpa device, it should send the VIRTIO_NET_CTRL_RX_PROMISC CVQ
+ * command to enable promiscuous mode to receive all packets,
+ * according to VirtIO standard, "Since there are no guarantees,
+ * it can use a hash filter or silently switch to allmulti or
+ * promiscuous mode if it is given too many addresses.".
+ *
+ * Since QEMU ignores MAC addresses beyond `MAC_TABLE_ENTRIES` and
+ * marks `n->mac_table.x_overflow` accordingly, it should have
+ * the same effect on the device model to receive
+ * (`MAC_TABLE_ENTRIES` + 1) or more non-multicast MAC addresses.
+ * The same applies to multicast MAC addresses.
+ *
+ * Therefore, QEMU can provide the device model with a fake
+ * VIRTIO_NET_CTRL_MAC_TABLE_SET command with (`MAC_TABLE_ENTRIES` + 1)
+ * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast
+ * MAC addresses. This ensures that the device model marks
+ * `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
+ * allowing all packets to be received, which aligns with the
+ * state of the vdpa device.
+ */
+static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s,
+                                                       VirtQueueElement *elem,
+                                                       struct iovec *out)
+{
+    struct virtio_net_ctrl_mac mac_data, *mac_ptr;
+    struct virtio_net_ctrl_hdr *hdr_ptr;
+    uint32_t cursor;
+    ssize_t r;
+
+    /* parse the non-multicast MAC address entries from CVQ command */
+    cursor = sizeof(*hdr_ptr);
+    r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
+                   &mac_data, sizeof(mac_data));
+    if (unlikely(r != sizeof(mac_data))) {
+        /*
+         * If the CVQ command is invalid, we should simulate the vdpa device
+         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+         */
+        *s->status = VIRTIO_NET_ERR;
+        return sizeof(*s->status);
+    }
+    cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
+
+    /* parse the multicast MAC address entries from CVQ command */
+    r = iov_to_buf(elem->out_sg, elem->out_num, cursor,
+                   &mac_data, sizeof(mac_data));
+    if (r != sizeof(mac_data)) {
+        /*
+         * If the CVQ command is invalid, we should simulate the vdpa device
+         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+         */
+        *s->status = VIRTIO_NET_ERR;
+        return sizeof(*s->status);
+    }
+    cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN;
+
+    /* validate the CVQ command */
+    if (iov_size(elem->out_sg, elem->out_num) != cursor) {
+        /*
+         * If the CVQ command is invalid, we should simulate the vdpa device
+         * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+         */
+        *s->status = VIRTIO_NET_ERR;
+        return sizeof(*s->status);
+    }
+
+    /*
+     * According to VirtIO standard, "Since there are no guarantees,
+     * it can use a hash filter or silently switch to allmulti or
+     * promiscuous mode if it is given too many addresses.".
+     *
+     * Therefore, considering that QEMU is unable to send the entire
+     * filter table to the vdpa device, it should send the
+     * VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode
+     */
+    r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1);
+    if (unlikely(r < 0)) {
+        return r;
+    }
+    if (*s->status != VIRTIO_NET_OK) {
+        return sizeof(*s->status);
+    }
+
+    /*
+     * QEMU should also send a fake VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ
+     * command to the device model, including (`MAC_TABLE_ENTRIES` + 1)
+     * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1)
+     * multicast MAC addresses.
+     *
+     * By doing so, the device model can mark `n->mac_table.uni_overflow`
+     * and `n->mac_table.multi_overflow`, enabling all packets to be
+     * received, which aligns with the state of the vdpa device.
+     */
+    cursor = 0;
+    uint32_t fake_uni_entries = MAC_TABLE_ENTRIES + 1,
+             fake_mul_entries = MAC_TABLE_ENTRIES + 1,
+             fake_cvq_size = sizeof(struct virtio_net_ctrl_hdr) +
+                             sizeof(mac_data) + fake_uni_entries * ETH_ALEN +
+                             sizeof(mac_data) + fake_mul_entries * ETH_ALEN;
+
+    assert(fake_cvq_size < vhost_vdpa_net_cvq_cmd_page_len());
+    out->iov_len = fake_cvq_size;
+
+    /* pack the header for fake CVQ command */
+    hdr_ptr = out->iov_base + cursor;
+    hdr_ptr->class = VIRTIO_NET_CTRL_MAC;
+    hdr_ptr->cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+    cursor += sizeof(*hdr_ptr);
+
+    /*
+     * Pack the non-multicast MAC addresses part for fake CVQ command.
+     *
+     * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
+     * addresses provieded in CVQ command. Therefore, only the entries
+     * field need to be prepared in the CVQ command.
+     */
+    mac_ptr = out->iov_base + cursor;
+    mac_ptr->entries = cpu_to_le32(fake_uni_entries);
+    cursor += sizeof(*mac_ptr) + fake_uni_entries * ETH_ALEN;
+
+    /*
+     * Pack the multicast MAC addresses part for fake CVQ command.
+     *
+     * According to virtio_net_handle_mac(), QEMU doesn't verify the MAC
+     * addresses provieded in CVQ command. Therefore, only the entries
+     * field need to be prepared in the CVQ command.
+     */
+    mac_ptr = out->iov_base + cursor;
+    mac_ptr->entries = cpu_to_le32(fake_mul_entries);
+
+    /*
+     * Simulating QEMU poll a vdpa device used buffer
+     * for VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+     */
+    return sizeof(*s->status);
+}
+
 /**
  * Validate and copy control virtqueue commands.
  *
@@ -943,7 +1085,7 @@  static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 
     out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
                              s->cvq_cmd_out_buffer,
-                             vhost_vdpa_net_cvq_cmd_len());
+                             vhost_vdpa_net_cvq_cmd_page_len());
 
     ctrl = s->cvq_cmd_out_buffer;
     if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) {
@@ -953,6 +1095,24 @@  static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
          */
         dev_written = sizeof(status);
         *s->status = VIRTIO_NET_OK;
+    } else if (unlikely(ctrl->class == VIRTIO_NET_CTRL_MAC &&
+                        ctrl->cmd == VIRTIO_NET_CTRL_MAC_TABLE_SET &&
+                        iov_size(elem->out_sg, elem->out_num) > out.iov_len)) {
+        /*
+         * Due to the size limitation of the out buffer sent to the vdpa device,
+         * which is determined by vhost_vdpa_net_cvq_cmd_page_len(), excessive
+         * MAC addresses set by the driver for the filter table can cause
+         * truncation of the CVQ command in QEMU. As a result, the vdpa device
+         * rejects the flawed CVQ command.
+         *
+         * Therefore, QEMU must handle this situation instead of sending
+         * the CVQ command direclty.
+         */
+        dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem,
+                                                                  &out);
+        if (unlikely(dev_written < 0)) {
+            goto out;
+        }
     } else {
         dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
         if (unlikely(dev_written < 0)) {