From patchwork Wed Jul 17 10:52:41 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047627
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39D2513A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2815D28723
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1C0BF2872A; Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7944D28736
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726900AbfGQKxK (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:10 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37400 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1726814AbfGQKxK (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:10 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id E4B0E5AFE9;
        Wed, 17 Jul 2019 10:53:09 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 010751001DDE;
        Wed, 17 Jul 2019 10:53:06 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 01/15] vhost: simplify meta data pointer accessing
Date: Wed, 17 Jul 2019 06:52:41 -0400
Message-Id: <20190717105255.63488-2-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]);
 Wed, 17 Jul 2019 10:53:10 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Instead of open coding meta pointer accessing through caches. This
patch introduces vhost_get_meta_ptr()/vhost_put_meta_ptr() pair to
reduce the code duplication and simplify its callers
implementation. This is helpful for reducing LOC for packed virtqueue
as well.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 207 ++++++++++++++----------------------------
 1 file changed, 70 insertions(+), 137 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dc9301d31f12..7f51c74d9aee 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1194,25 +1194,37 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
-static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
-{
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
+static void *vhost_get_meta_ptr(struct vhost_virtqueue *vq, int type)
+{
 	struct vhost_map *map;
-	struct vring_used *used;
 
 	if (!vq->iotlb) {
 		rcu_read_lock();
+		map = rcu_dereference(vq->maps[type]);
+		if (likely(map))
+			return map->addr;
+		rcu_read_unlock();
+	}
+	return NULL;
+}
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			*((__virtio16 *)&used->ring[vq->num]) =
-				cpu_to_vhost16(vq, vq->avail_idx);
-			rcu_read_unlock();
-			return 0;
-		}
+static void vhost_put_meta_ptr(void)
+{
+	rcu_read_unlock();
+}
+#endif
 
-		rcu_read_unlock();
+static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(used)) {
+		*((__virtio16 *)&used->ring[vq->num]) =
+			cpu_to_vhost16(vq, vq->avail_idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1225,23 +1237,14 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
 				 int count)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 	size_t size;
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			size = count * sizeof(*head);
-			memcpy(used->ring + idx, head, size);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		size = count * sizeof(*head);
+		memcpy(used->ring + idx, head, size);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1253,21 +1256,12 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			used->flags = cpu_to_vhost16(vq, vq->used_flags);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		used->flags = cpu_to_vhost16(vq, vq->used_flags);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1279,21 +1273,12 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1343,21 +1328,12 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
 				      __virtio16 *idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*idx = avail->idx;
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*idx = avail->idx;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1368,21 +1344,12 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
 				       __virtio16 *head, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*head = avail->ring[idx & (vq->num - 1)];
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*head = avail->ring[idx & (vq->num - 1)];
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1394,21 +1361,12 @@ static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
 					__virtio16 *flags)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*flags = avail->flags;
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*flags = avail->flags;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1419,19 +1377,12 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq,
 				       __virtio16 *event)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*event = (__virtio16)avail->ring[vq->num];
-			rcu_read_unlock();
-			return 0;
-		}
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*event = (__virtio16)avail->ring[vq->num];
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1442,21 +1393,12 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
 				     __virtio16 *idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			*idx = used->idx;
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-		rcu_read_unlock();
+	if (likely(used)) {
+		*idx = used->idx;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1467,21 +1409,12 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 				 struct vring_desc *desc, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_desc *d;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]);
-		if (likely(map)) {
-			d = map->addr;
-			*desc = *(d + idx);
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
 
-		rcu_read_unlock();
+	if (likely(d)) {
+		*desc = *(d + idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 

From patchwork Wed Jul 17 10:52:42 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047629
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9067C912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F4D528736
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6A32728734; Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EBA928736
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730585AbfGQKxO (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:14 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37452 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1725893AbfGQKxN (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:13 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 5AE2C5D674;
        Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 6CF9B1001B16;
        Wed, 17 Jul 2019 10:53:10 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 02/15] vhost: remove the unnecessary parameter of
 vhost_vq_avail_empty()
Date: Wed, 17 Jul 2019 06:52:42 -0400
Message-Id: <20190717105255.63488-3-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]);
 Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Its dev parameter is not even used, so remove it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 8 ++++----
 drivers/vhost/vhost.c | 2 +-
 drivers/vhost/vhost.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 3beb401235c0..7d34e8cbc89b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -498,7 +498,7 @@ static int sock_has_rx_data(struct socket *sock)
 static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
 					  struct vhost_virtqueue *vq)
 {
-	if (!vhost_vq_avail_empty(&net->dev, vq)) {
+	if (!vhost_vq_avail_empty(vq)) {
 		vhost_poll_queue(&vq->poll);
 	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
 		vhost_disable_notify(&net->dev, vq);
@@ -540,8 +540,8 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 		}
 
 		if ((sock_has_rx_data(sock) &&
-		     !vhost_vq_avail_empty(&net->dev, rvq)) ||
-		    !vhost_vq_avail_empty(&net->dev, tvq))
+		     !vhost_vq_avail_empty(rvq)) ||
+		    !vhost_vq_avail_empty(tvq))
 			break;
 
 		cpu_relax();
@@ -638,7 +638,7 @@ static int get_tx_bufs(struct vhost_net *net,
 static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
 {
 	return total_len < VHOST_NET_WEIGHT &&
-	       !vhost_vq_avail_empty(vq->dev, vq);
+	       !vhost_vq_avail_empty(vq);
 }
 
 #define SKB_FRAG_PAGE_ORDER     get_order(32768)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7f51c74d9aee..ec3534bcd51b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2946,7 +2946,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 819296332913..e0451c900177 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -247,7 +247,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
-bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,

From patchwork Wed Jul 17 10:52:43 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047631
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5184613A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FF2228735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 3383E2873B; Wed, 17 Jul 2019 10:53:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C80828735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730931AbfGQKxS (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:34888 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1725893AbfGQKxR (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:17 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id C5A428553A;
        Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id D765D1001DC0;
        Wed, 17 Jul 2019 10:53:13 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 03/15] vhost: remove unnecessary parameter of
 vhost_enable_notify()/vhost_disable_notify
Date: Wed, 17 Jul 2019 06:52:43 -0400
Message-Id: <20190717105255.63488-4-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]);
 Wed, 17 Jul 2019 10:53:16 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Its dev parameter is not even used, so remove it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 25 ++++++++++++-------------
 drivers/vhost/scsi.c  | 12 ++++++------
 drivers/vhost/test.c  |  6 +++---
 drivers/vhost/vhost.c |  4 ++--
 drivers/vhost/vhost.h |  4 ++--
 drivers/vhost/vsock.c | 14 +++++++-------
 6 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7d34e8cbc89b..78d248574f8e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -500,8 +500,8 @@ static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
 {
 	if (!vhost_vq_avail_empty(vq)) {
 		vhost_poll_queue(&vq->poll);
-	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-		vhost_disable_notify(&net->dev, vq);
+	} else if (unlikely(vhost_enable_notify(vq))) {
+		vhost_disable_notify(vq);
 		vhost_poll_queue(&vq->poll);
 	}
 }
@@ -524,7 +524,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	if (!mutex_trylock(&vq->mutex))
 		return;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	sock = rvq->private_data;
 
 	busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
@@ -552,7 +552,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	if (poll_rx || sock_has_rx_data(sock))
 		vhost_net_busy_poll_try_queue(net, vq);
 	else if (!poll_rx) /* On tx here, sock has no rx data. */
-		vhost_enable_notify(&net->dev, rvq);
+		vhost_enable_notify(rvq);
 
 	mutex_unlock(&vq->mutex);
 }
@@ -788,9 +788,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (head == vq->num) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev,
-								vq))) {
-				vhost_disable_notify(&net->dev, vq);
+			} else if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -880,8 +879,8 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (head == vq->num) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-				vhost_disable_notify(&net->dev, vq);
+			} else if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -960,7 +959,7 @@ static void handle_tx(struct vhost_net *net)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	vhost_net_disable_vq(net, vq);
 
 	if (vhost_sock_zcopy(sock))
@@ -1129,7 +1128,7 @@ static void handle_rx(struct vhost_net *net)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	vhost_net_disable_vq(net, vq);
 
 	vhost_hlen = nvq->vhost_hlen;
@@ -1156,10 +1155,10 @@ static void handle_rx(struct vhost_net *net)
 		if (!headcount) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+			} else if (unlikely(vhost_enable_notify(vq))) {
 				/* They have slipped one in as we were
 				 * doing that: check again. */
-				vhost_disable_notify(&net->dev, vq);
+				vhost_disable_notify(vq);
 				continue;
 			}
 			/* Nothing new?  Wait for eventfd to tell us
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index a9caf1bc3c3e..8d4e87007a8d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -458,7 +458,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	}
 
 again:
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 	head = vhost_get_vq_desc(vq, vq->iov,
 			ARRAY_SIZE(vq->iov), &out, &in,
 			NULL, NULL);
@@ -467,7 +467,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 		return;
 	}
 	if (head == vq->num) {
-		if (vhost_enable_notify(&vs->dev, vq))
+		if (vhost_enable_notify(vq))
 			goto again;
 		vs->vs_events_missed = true;
 		return;
@@ -828,8 +828,8 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
 	if (vc->head == vq->num) {
-		if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
-			vhost_disable_notify(&vs->dev, vq);
+		if (unlikely(vhost_enable_notify(vq))) {
+			vhost_disable_notify(vq);
 			ret = -EAGAIN;
 		}
 		goto done;
@@ -936,7 +936,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 	memset(&vc, 0, sizeof(vc));
 	vc.rsp_size = sizeof(struct virtio_scsi_cmd_resp);
 
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		ret = vhost_scsi_get_desc(vs, vq, &vc);
@@ -1189,7 +1189,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 
 	memset(&vc, 0, sizeof(vc));
 
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		ret = vhost_scsi_get_desc(vs, vq, &vc);
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 40589850eb33..746f5d439153 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -50,7 +50,7 @@ static void handle_vq(struct vhost_test *n)
 		return;
 	}
 
-	vhost_disable_notify(&n->dev, vq);
+	vhost_disable_notify(vq);
 
 	for (;;) {
 		head = vhost_get_vq_desc(vq, vq->iov,
@@ -62,8 +62,8 @@ static void handle_vq(struct vhost_test *n)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(&n->dev, vq))) {
-				vhost_disable_notify(&n->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ec3534bcd51b..e781db88dfca 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2964,7 +2964,7 @@ bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
 /* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+bool vhost_enable_notify(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3002,7 +3002,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
 /* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+void vhost_disable_notify(struct vhost_virtqueue *vq)
 {
 	int r;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e0451c900177..e054f178d8b0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -246,9 +246,9 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
-void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+void vhost_disable_notify(struct vhost_virtqueue *vq);
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);
-bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_enable_notify(struct vhost_virtqueue *vq);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len,
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 814bed72d793..f94021b450f0 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -96,7 +96,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		goto out;
 
 	/* Avoid further vmexits, we're already processing the virtqueue */
-	vhost_disable_notify(&vsock->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		struct virtio_vsock_pkt *pkt;
@@ -109,7 +109,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		spin_lock_bh(&vsock->send_pkt_list_lock);
 		if (list_empty(&vsock->send_pkt_list)) {
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
-			vhost_enable_notify(&vsock->dev, vq);
+			vhost_enable_notify(vq);
 			break;
 		}
 
@@ -135,8 +135,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			/* We cannot finish yet if more buffers snuck in while
 			 * re-enabling notify.
 			 */
-			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
-				vhost_disable_notify(&vsock->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -369,7 +369,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	if (!vq->private_data)
 		goto out;
 
-	vhost_disable_notify(&vsock->dev, vq);
+	vhost_disable_notify(vq);
 	do {
 		u32 len;
 
@@ -387,8 +387,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			break;
 
 		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
-				vhost_disable_notify(&vsock->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;

From patchwork Wed Jul 17 10:52:44 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047655
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2362513A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:57 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1356D2872A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:57 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 07B212873A; Wed, 17 Jul 2019 10:54:57 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A0FAB28735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731162AbfGQKx2 (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:28 -0400
Received: from mx1.redhat.com ([209.132.183.28]:34266 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731067AbfGQKx1 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:27 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 0163B60CC;
        Wed, 17 Jul 2019 10:53:27 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 75CE81001DDE;
        Wed, 17 Jul 2019 10:53:17 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 04/15] vhost-net: don't use vhost_add_used_n() for zerocopy
Date: Wed, 17 Jul 2019 06:52:44 -0400
Message-Id: <20190717105255.63488-5-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]);
 Wed, 17 Jul 2019 10:53:27 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We tried to use vhost_add_used_n() for the packets that is not
zero-copied. This can help to mitigate HOL issue but not a total
solution. What's more, it may lead out of order completion and cause
extra complexity for packed virtqueue implementation that needs to
maintain wrap counters.

So this patch switch to constantly use vq->heads[] to maintain
heads. This will ease the introduction of zerocopy shadow used ring
API and reduce the complexity for packed virtqueues.

After this, vhost_net became a in order device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 78d248574f8e..ac31983d2d77 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -894,9 +894,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (zcopy_used) {
 			struct ubuf_info *ubuf;
 			ubuf = nvq->ubuf_info + nvq->upend_idx;
-
-			vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
-			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
 			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
@@ -907,11 +904,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_controllen = sizeof(ctl);
 			ubufs = nvq->ubufs;
 			atomic_inc(&ubufs->refcount);
-			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		} else {
 			msg.msg_control = NULL;
 			ubufs = NULL;
 		}
+		vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
+		vq->heads[nvq->upend_idx].len = zcopy_used ?
+			 VHOST_DMA_IN_PROGRESS : VHOST_DMA_DONE_LEN;
+		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		total_len += len;
 		if (tx_can_batch(vq, total_len) &&
 		    likely(!vhost_exceeds_maxpend(net))) {
@@ -923,11 +923,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			if (zcopy_used) {
+			if (zcopy_used)
 				vhost_net_ubuf_put(ubufs);
-				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
-					% UIO_MAXIOV;
-			}
+			nvq->upend_idx = ((unsigned int)nvq->upend_idx - 1)
+					 % UIO_MAXIOV;
 			vhost_discard_vq_desc(vq, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
@@ -935,10 +934,8 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
-		else
-			vhost_zerocopy_signal_used(net, vq);
+
+		vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 }

From patchwork Wed Jul 17 10:52:45 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047653
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5066513A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:55 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F85928739
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:55 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 341592873B; Wed, 17 Jul 2019 10:54:55 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92A0728739
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731297AbfGQKxc (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:32 -0400
Received: from mx1.redhat.com ([209.132.183.28]:41948 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731067AbfGQKxb (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:31 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id BA9923083362;
        Wed, 17 Jul 2019 10:53:30 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 7D8CE1001B16;
        Wed, 17 Jul 2019 10:53:27 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 05/15] vhost: introduce helpers to manipulate shadow used
 ring
Date: Wed, 17 Jul 2019 06:52:45 -0400
Message-Id: <20190717105255.63488-6-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]);
 Wed, 17 Jul 2019 10:53:30 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We open coding vq->heads[] in net.c for:

1) implementing batching which in fact a shadow used ring
   implementation.
2) maintain pending heads in order which is in fact another kind of
   shadow used ring

But this expose used ring layout for device which makes it hard to
introduce new kind of ring like packed virtqueue. So this patch
introduces two types of shadow used ring API:

1) shadow used ring API for batch updating of used heads
2) zerocopy shadow used API for maintaining pending heads and batch
   updating used heads

This can help to hide the used ring layout from device. Device should
not mix using those two kinds of APIs.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 95 +++++++++++++++++++++++++++++++++++++------
 drivers/vhost/vhost.h | 18 ++++++++
 2 files changed, 100 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e781db88dfca..5bfca5b76b05 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -486,6 +486,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	vhost_reset_vq_maps(vq);
 #endif
+	vq->nheads = 0;
 }
 
 static int vhost_worker(void *data)
@@ -2790,25 +2791,28 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
-	vq->last_avail_idx -= n;
+	BUG_ON(count > vq->nheads);
+	vq->nheads -= count;
 }
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+/* Reverse the effect of vhost_get_vq_desc and
+ * vhost_add_shadow_used. Useful for error handleing
+ */
+void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n)
 {
-	struct vring_used_elem heads = {
-		cpu_to_vhost32(vq, head),
-		cpu_to_vhost32(vq, len)
-	};
+	vhost_withdraw_shadow_used(vq, n);
+	vhost_discard_vq_desc(vq, n);
+}
+EXPORT_SYMBOL_GPL(vhost_discard_shadow_used);
 
-	return vhost_add_used_n(vq, &heads, 1);
+/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+{
+	vq->last_avail_idx -= n;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used);
+EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 			    struct vring_used_elem *heads,
@@ -2842,6 +2846,41 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
+void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
+			       int idx, int len)
+{
+	vq->heads[idx].len = len;
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
+
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+{
+	return vq->heads[idx].len;
+}
+EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
+
+void vhost_set_zc_used(struct vhost_virtqueue *vq,
+			   int idx, unsigned int head, int len)
+{
+	vq->heads[idx].id = head;
+	vq->heads[idx].len = len;
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used);
+
+void vhost_add_shadow_used(struct vhost_virtqueue *vq,
+			   unsigned int head, int len)
+{
+	vhost_set_zc_used(vq, vq->nheads, head, len);
+	++vq->nheads;
+}
+EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
+
+int vhost_get_shadow_used_count(struct vhost_virtqueue *vq)
+{
+	return vq->nheads;
+}
+EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
+
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
 int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
@@ -2879,6 +2918,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+	struct vring_used_elem heads = {
+		cpu_to_vhost32(vq, head),
+		cpu_to_vhost32(vq, len)
+	};
+
+	return vhost_add_used_n(vq, &heads, 1);
+}
+EXPORT_SYMBOL_GPL(vhost_add_used);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
@@ -2945,6 +2997,23 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
+void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
+
+{
+	if (!vq->nheads)
+		return;
+
+	vhost_add_used_and_signal_n(vq->dev, vq, vq->heads, vq->nheads);
+	vq->nheads = 0;
+}
+EXPORT_SYMBOL_GPL(vhost_flush_shadow_used_and_signal);
+
+void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n)
+{
+	vhost_add_used_and_signal_n(vq->dev, vq, &vq->heads[idx], n);
+}
+EXPORT_SYMBOL_GPL(vhost_flush_zc_used_and_signal);
+
 /* return true if we're sure that avaiable ring is empty */
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e054f178d8b0..175eb5ebf954 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -163,6 +163,7 @@ struct vhost_virtqueue {
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
 	struct vring_used_elem *heads;
+	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
 	struct vhost_umem *iotlb;
@@ -241,10 +242,27 @@ int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
 		     unsigned count);
+
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
+
+/* Zerocopy shadow used ring API */
+void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
+			   int idx, int len);
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
+void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       unsigned int head, int len);
+
+/* Non zerocopy shadow used ring API */
+void vhost_add_shadow_used(struct vhost_virtqueue *vq, unsigned int head,
+			   int len);
+void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
+void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
+int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
+
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_virtqueue *vq);
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);

From patchwork Wed Jul 17 10:52:46 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047633
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5EE013A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D591928735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CA6BE28737; Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BD1528735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731446AbfGQKxg (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:36 -0400
Received: from mx1.redhat.com ([209.132.183.28]:56388 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731410AbfGQKxe (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:34 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 303A24E93D;
        Wed, 17 Jul 2019 10:53:34 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 4371110018FF;
        Wed, 17 Jul 2019 10:53:30 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 06/15] vhost_net: switch TX to use shadow used ring API
Date: Wed, 17 Jul 2019 06:52:46 -0400
Message-Id: <20190717105255.63488-7-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]);
 Wed, 17 Jul 2019 10:53:34 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch switch to use shadow used ring API for transmission. This
will help to hide used ring layout from device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ac31983d2d77..cf47e6e348f4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -361,22 +361,22 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 {
 	struct vhost_net_virtqueue *nvq =
 		container_of(vq, struct vhost_net_virtqueue, vq);
-	int i, add;
+	int i, add, len;
 	int j = 0;
 
 	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-		if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+		len = vhost_get_zc_used_len(vq, i);
+		if (len == VHOST_DMA_FAILED_LEN)
 			vhost_net_tx_err(net);
-		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+		if (VHOST_DMA_IS_DONE(len)) {
+			vhost_set_zc_used_len(vq, i, VHOST_DMA_CLEAR_LEN);
 			++j;
 		} else
 			break;
 	}
 	while (j) {
 		add = min(UIO_MAXIOV - nvq->done_idx, j);
-		vhost_add_used_and_signal_n(vq->dev, vq,
-					    &vq->heads[nvq->done_idx], add);
+		vhost_flush_zc_used_and_signal(vq, nvq->done_idx, add);
 		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
 		j -= add;
 	}
@@ -391,8 +391,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_lock_bh();
 
 	/* set len to mark this desc buffers done DMA */
-	vq->heads[ubuf->desc].len = success ?
-		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
+	vhost_set_zc_used_len(vq, ubuf->desc, success ?
+			      VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN);
 	cnt = vhost_net_ubuf_put(ubufs);
 
 	/*
@@ -480,7 +480,7 @@ static void vhost_tx_batch(struct vhost_net *net,
 	}
 
 signal_used:
-	vhost_net_signal_used(nvq);
+	vhost_flush_shadow_used_and_signal(&nvq->vq);
 	nvq->batched_xdp = 0;
 }
 
@@ -776,7 +776,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	do {
 		bool busyloop_intr = false;
 
-		if (nvq->done_idx == VHOST_NET_BATCH)
+		if (vhost_get_shadow_used_count(vq) == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
 		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
@@ -835,9 +835,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
-		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->done_idx].len = 0;
-		++nvq->done_idx;
+		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, head), 0);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
 	vhost_tx_batch(net, nvq, sock, &msg);
@@ -908,9 +906,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_control = NULL;
 			ubufs = NULL;
 		}
-		vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->upend_idx].len = zcopy_used ?
-			 VHOST_DMA_IN_PROGRESS : VHOST_DMA_DONE_LEN;
+		vhost_set_zc_used(vq, nvq->upend_idx,
+				  cpu_to_vhost32(vq, head),
+				  zcopy_used ? VHOST_DMA_IN_PROGRESS :
+				  VHOST_DMA_DONE_LEN);
 		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		total_len += len;
 		if (tx_can_batch(vq, total_len) &&

From patchwork Wed Jul 17 10:52:47 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047635
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83F36912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:44 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 735042872A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:44 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6727528737; Wed, 17 Jul 2019 10:53:44 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E6232872A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731637AbfGQKxi (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:38 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37620 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731586AbfGQKxh (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:37 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 95C1659474;
        Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id AB25B1001DC0;
        Wed, 17 Jul 2019 10:53:34 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 07/15] vhost_net: calculate last used length once for
 mergeable buffer
Date: Wed, 17 Jul 2019 06:52:47 -0400
Message-Id: <20190717105255.63488-8-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]);
 Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch tries to calculate last used length once instead of
twice. This can help to convert to use shadow used ring API for
RX.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index cf47e6e348f4..1a67f889cbc1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1065,12 +1065,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		}
 		heads[headcount].id = cpu_to_vhost32(vq, d);
 		len = iov_length(vq->iov + seg, in);
-		heads[headcount].len = cpu_to_vhost32(vq, len);
 		datalen -= len;
+		heads[headcount].len = cpu_to_vhost32(vq,
+				       datalen >= 0 ? len : len + datalen);
 		++headcount;
 		seg += in;
 	}
-	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
 	*iovcount = seg;
 	if (unlikely(log))
 		*log_num = nlogs;

From patchwork Wed Jul 17 10:52:48 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047637
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7158D912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61A6428734
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 55CEB28736; Wed, 17 Jul 2019 10:53:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E218828735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:53:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731982AbfGQKxt (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:56856 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731776AbfGQKxr (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:47 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id A28E7307D849;
        Wed, 17 Jul 2019 10:53:47 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 46BE410246E5;
        Wed, 17 Jul 2019 10:53:37 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 08/15] vhost_net: switch to use shadow used ring API for RX
Date: Wed, 17 Jul 2019 06:52:48 -0400
Message-Id: <20190717105255.63488-9-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]);
 Wed, 17 Jul 2019 10:53:47 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch switches to use shadow used ring API for RX. This will help
to hid used ring layout from device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1a67f889cbc1..9e087d08b199 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -445,18 +445,6 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 	return vhost_poll_start(poll, sock->file);
 }
 
-static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
-{
-	struct vhost_virtqueue *vq = &nvq->vq;
-	struct vhost_dev *dev = vq->dev;
-
-	if (!nvq->done_idx)
-		return;
-
-	vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
-	nvq->done_idx = 0;
-}
-
 static void vhost_tx_batch(struct vhost_net *net,
 			   struct vhost_net_virtqueue *nvq,
 			   struct socket *sock,
@@ -999,7 +987,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 
 	if (!len && rvq->busyloop_timeout) {
 		/* Flush batched heads first */
-		vhost_net_signal_used(rnvq);
+		vhost_flush_shadow_used_and_signal(rvq);
 		/* Both tx vq and rx socket were polled here */
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
 
@@ -1020,7 +1008,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
  *	returns number of buffer heads allocated, negative on error
  */
 static int get_rx_bufs(struct vhost_virtqueue *vq,
-		       struct vring_used_elem *heads,
 		       int datalen,
 		       unsigned *iovcount,
 		       struct vhost_log *log,
@@ -1063,11 +1050,11 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			nlogs += *log_num;
 			log += *log_num;
 		}
-		heads[headcount].id = cpu_to_vhost32(vq, d);
 		len = iov_length(vq->iov + seg, in);
 		datalen -= len;
-		heads[headcount].len = cpu_to_vhost32(vq,
-				       datalen >= 0 ? len : len + datalen);
+		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, d),
+				      cpu_to_vhost32(vq, datalen >= 0 ? len
+						     : len + datalen));
 		++headcount;
 		seg += in;
 	}
@@ -1082,7 +1069,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 	}
 	return headcount;
 err:
-	vhost_discard_vq_desc(vq, headcount);
+	vhost_discard_shadow_used(vq, headcount);
 	return r;
 }
 
@@ -1141,8 +1128,7 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
-					vhost_len, &in, vq_log, &log,
+		headcount = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
 					likely(mergeable) ? UIO_MAXIOV : 1);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(headcount < 0))
@@ -1189,7 +1175,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(err != sock_len)) {
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_shadow_used(vq, headcount);
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
@@ -1213,12 +1199,11 @@ static void handle_rx(struct vhost_net *net)
 		    copy_to_iter(&num_buffers, sizeof num_buffers,
 				 &fixup) != sizeof num_buffers) {
 			vq_err(vq, "Failed num_buffers write");
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_shadow_used(vq, headcount);
 			goto out;
 		}
-		nvq->done_idx += headcount;
-		if (nvq->done_idx > VHOST_NET_BATCH)
-			vhost_net_signal_used(nvq);
+		if (vhost_get_shadow_used_count(vq) > VHOST_NET_BATCH)
+			vhost_flush_shadow_used_and_signal(vq);
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, vhost_len,
 					vq->iov, in);
@@ -1230,7 +1215,7 @@ static void handle_rx(struct vhost_net *net)
 	else if (!sock_len)
 		vhost_net_enable_vq(net, vq);
 out:
-	vhost_net_signal_used(nvq);
+	vhost_flush_shadow_used_and_signal(vq);
 	mutex_unlock(&vq->mutex);
 }
 

From patchwork Wed Jul 17 10:52:49 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047651
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41F8F912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3222928737
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 2621428739; Wed, 17 Jul 2019 10:54:47 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C82D228737
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732247AbfGQKxv (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:51 -0400
Received: from mx1.redhat.com ([209.132.183.28]:43482 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731759AbfGQKxv (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:51 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 16B80307CDEA;
        Wed, 17 Jul 2019 10:53:51 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 266421001B00;
        Wed, 17 Jul 2019 10:53:47 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 09/15] vhost: do not export vhost_add_used_n() and
 vhost_add_used_and_signal_n()
Date: Wed, 17 Jul 2019 06:52:49 -0400
Message-Id: <20190717105255.63488-10-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]);
 Wed, 17 Jul 2019 10:53:51 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We would request device to use shadow used ring API. Then there's no
need for exposing those to device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 13 +++++++------
 drivers/vhost/vhost.h |  4 ----
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5bfca5b76b05..50ba382f0981 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2883,8 +2883,9 @@ EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
-		     unsigned count)
+static int vhost_add_used_n(struct vhost_virtqueue *vq,
+			    struct vring_used_elem *heads,
+			    unsigned count)
 {
 	int start, n, r;
 
@@ -2988,14 +2989,14 @@ void vhost_add_used_and_signal(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
 
 /* multi-buffer version of vhost_add_used_and_signal */
-void vhost_add_used_and_signal_n(struct vhost_dev *dev,
-				 struct vhost_virtqueue *vq,
-				 struct vring_used_elem *heads, unsigned count)
+static void vhost_add_used_and_signal_n(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq,
+					struct vring_used_elem *heads,
+					unsigned count)
 {
 	vhost_add_used_n(vq, heads, count);
 	vhost_signal(dev, vq);
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 175eb5ebf954..481baba20c3d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -240,13 +240,9 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
-		     unsigned count);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
-void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
-			       struct vring_used_elem *heads, unsigned count);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,

From patchwork Wed Jul 17 10:52:50 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047641
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A30613A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:10 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75F4728723
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6A28328736; Wed, 17 Jul 2019 10:54:10 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6D2728727
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732550AbfGQKx5 (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:57 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48626 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731759AbfGQKxz (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:55 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id BCBFB81F0E;
        Wed, 17 Jul 2019 10:53:54 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 912C21001B00;
        Wed, 17 Jul 2019 10:53:51 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 10/15] vhost: hide used ring layout from device
Date: Wed, 17 Jul 2019 06:52:50 -0400
Message-Id: <20190717105255.63488-11-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]);
 Wed, 17 Jul 2019 10:53:54 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We used to return descriptor head by vhost_get_vq_desc() to device and
pass it back to vhost_add_used() and its friends. This exposes the
internal used ring layout to device which makes it hard to be extended for
e.g packed ring layout.

So this patch tries to hide the used ring layout by

- letting vhost_get_vq_desc() return pointer to struct vring_used_elem
- accepting pointer to struct vring_used_elem in vhost_add_used() and
  vhost_add_used_and_signal()

This could help to hide used ring layout and make it easier to
implement packed ring on top.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 88 ++++++++++++++++++++++---------------------
 drivers/vhost/scsi.c  | 62 ++++++++++++++++--------------
 drivers/vhost/vhost.c | 38 +++++++++++--------
 drivers/vhost/vhost.h | 11 +++---
 drivers/vhost/vsock.c | 43 +++++++++++----------
 5 files changed, 129 insertions(+), 113 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9e087d08b199..572d80c8c36e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -546,25 +546,28 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-				    struct vhost_net_virtqueue *tnvq,
+				    struct vring_used_elem *used_elem,
 				    unsigned int *out_num, unsigned int *in_num,
 				    struct msghdr *msghdr, bool *busyloop_intr)
 {
+	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
 	struct vhost_virtqueue *rvq = &rnvq->vq;
 	struct vhost_virtqueue *tvq = &tnvq->vq;
 
-	int r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
+	int r = vhost_get_vq_desc(tvq, used_elem, tvq->iov,
+				  ARRAY_SIZE(tvq->iov),
 				  out_num, in_num, NULL, NULL);
 
-	if (r == tvq->num && tvq->busyloop_timeout) {
+	if (r == -ENOSPC && tvq->busyloop_timeout) {
 		/* Flush batched packets first */
 		if (!vhost_sock_zcopy(tvq->private_data))
 			vhost_tx_batch(net, tnvq, tvq->private_data, msghdr);
 
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
 
-		r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
+		r = vhost_get_vq_desc(tvq, used_elem, tvq->iov,
+				      ARRAY_SIZE(tvq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
 
@@ -593,6 +596,7 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
 }
 
 static int get_tx_bufs(struct vhost_net *net,
+		       struct vring_used_elem *used_elem,
 		       struct vhost_net_virtqueue *nvq,
 		       struct msghdr *msg,
 		       unsigned int *out, unsigned int *in,
@@ -601,9 +605,10 @@ static int get_tx_bufs(struct vhost_net *net,
 	struct vhost_virtqueue *vq = &nvq->vq;
 	int ret;
 
-	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
+	ret = vhost_net_tx_get_vq_desc(net, used_elem, out, in,
+				       msg, busyloop_intr);
 
-	if (ret < 0 || ret == vq->num)
+	if (ret < 0 || ret == -ENOSPC)
 		return ret;
 
 	if (*in) {
@@ -747,8 +752,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vring_used_elem used;
 	unsigned out, in;
-	int head;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -767,13 +772,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (vhost_get_shadow_used_count(vq) == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
-			break;
+		err = get_tx_bufs(net, &used,
+				  nvq, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -782,7 +785,9 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			}
 			break;
 		}
-
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			break;
 		total_len += len;
 
 		/* For simplicity, TX batching is only enabled if
@@ -823,7 +828,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
-		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, head), 0);
+		vhost_add_shadow_used(vq, &used, 0);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
 	vhost_tx_batch(net, nvq, sock, &msg);
@@ -834,7 +839,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 	unsigned out, in;
-	int head;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -843,6 +847,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		.msg_flags = MSG_DONTWAIT,
 	};
 	struct tun_msg_ctl ctl;
+	struct vring_used_elem used;
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -856,13 +861,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		vhost_zerocopy_signal_used(net, vq);
 
 		busyloop_intr = false;
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
-			break;
+		err = get_tx_bufs(net, &used, nvq, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -871,6 +873,9 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			}
 			break;
 		}
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			break;
 
 		zcopy_used = len >= VHOST_GOODCOPY_LEN
 			     && !vhost_exceeds_maxpend(net)
@@ -895,7 +900,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubufs = NULL;
 		}
 		vhost_set_zc_used(vq, nvq->upend_idx,
-				  cpu_to_vhost32(vq, head),
+				  &used,
 				  zcopy_used ? VHOST_DMA_IN_PROGRESS :
 				  VHOST_DMA_DONE_LEN);
 		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
@@ -921,7 +926,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-
 		vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
@@ -1012,34 +1016,30 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		       unsigned *iovcount,
 		       struct vhost_log *log,
 		       unsigned *log_num,
-		       unsigned int quota)
+		       unsigned int quota,
+		       s16 *count)
 {
 	unsigned int out, in;
 	int seg = 0;
 	int headcount = 0;
-	unsigned d;
-	int r, nlogs = 0;
+	int r = 0, nlogs = 0;
 	/* len is always initialized before use since we are always called with
 	 * datalen > 0.
 	 */
 	u32 uninitialized_var(len);
+	struct vring_used_elem uninitialized_var(used);
 
 	while (datalen > 0 && headcount < quota) {
 		if (unlikely(seg >= UIO_MAXIOV)) {
 			r = -ENOBUFS;
 			goto err;
 		}
-		r = vhost_get_vq_desc(vq, vq->iov + seg,
+		r = vhost_get_vq_desc(vq, &used, vq->iov + seg,
 				      ARRAY_SIZE(vq->iov) - seg, &out,
 				      &in, log, log_num);
 		if (unlikely(r < 0))
 			goto err;
 
-		d = r;
-		if (d == vq->num) {
-			r = 0;
-			goto err;
-		}
 		if (unlikely(out || in <= 0)) {
 			vq_err(vq, "unexpected descriptor format for RX: "
 				"out %d, in %d\n", out, in);
@@ -1052,7 +1052,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		}
 		len = iov_length(vq->iov + seg, in);
 		datalen -= len;
-		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, d),
+		vhost_add_shadow_used(vq, &used,
 				      cpu_to_vhost32(vq, datalen >= 0 ? len
 						     : len + datalen));
 		++headcount;
@@ -1064,12 +1064,15 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 
 	/* Detect overrun */
 	if (unlikely(datalen > 0)) {
-		r = UIO_MAXIOV + 1;
+		headcount = UIO_MAXIOV + 1;
 		goto err;
 	}
-	return headcount;
+
+	*count = headcount;
+	return 0;
 err:
 	vhost_discard_shadow_used(vq, headcount);
+	*count = 0;
 	return r;
 }
 
@@ -1128,13 +1131,11 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
-					likely(mergeable) ? UIO_MAXIOV : 1);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(headcount < 0))
-			goto out;
+		err = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
+				  likely(mergeable) ? UIO_MAXIOV : 1,
+				  &headcount);
 		/* OK, now we need to know about added descriptors. */
-		if (!headcount) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -1148,6 +1149,9 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			goto out;
 		if (nvq->rx_ring)
 			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
 		/* On overrun, truncate and discard */
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8d4e87007a8d..4a5a75ab25ad 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -72,7 +72,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
 	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-	int tvc_vq_desc;
+	struct vring_used_elem tvc_vq_used;
 	/* virtio-scsi initiator task attribute */
 	int tvc_task_attr;
 	/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-	int head;
+	struct vring_used_elem head;
 	unsigned int out, in;
 	size_t req_size, rsp_size;
 	size_t out_size, in_size;
@@ -449,8 +449,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	struct virtio_scsi_event *event = &evt->event;
 	struct virtio_scsi_event __user *eventp;
+	struct vring_used_elem used;
 	unsigned out, in;
-	int head, ret;
+	int ret;
 
 	if (!vq->private_data) {
 		vs->vs_events_missed = true;
@@ -459,16 +460,16 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 
 again:
 	vhost_disable_notify(vq);
-	head = vhost_get_vq_desc(vq, vq->iov,
+	ret = vhost_get_vq_desc(vq, &used, vq->iov,
 			ARRAY_SIZE(vq->iov), &out, &in,
 			NULL, NULL);
-	if (head < 0) {
+	if (ret == -ENOSPC) {
+		if (vhost_enable_notify(&vs->dev, vq))
+			goto again;
 		vs->vs_events_missed = true;
 		return;
 	}
-	if (head == vq->num) {
-		if (vhost_enable_notify(vq))
-			goto again;
+	if (ret < 0) {
 		vs->vs_events_missed = true;
 		return;
 	}
@@ -488,7 +489,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	eventp = vq->iov[out].iov_base;
 	ret = __copy_to_user(eventp, event, sizeof(*event));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &used, 0);
 	else
 		vq_err(vq, "Faulted on vhost_scsi_send_event\n");
 }
@@ -549,7 +550,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret == sizeof(v_rsp))) {
 			struct vhost_scsi_virtqueue *q;
-			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+			vhost_add_used(cmd->tvc_vq, &cmd->tvc_vq_used, 0);
 			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
 			vq = q - vs->vqs;
 			__set_bit(vq, signal);
@@ -793,7 +794,7 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 			   struct vhost_virtqueue *vq,
-			   int head, unsigned out)
+			   struct vhost_scsi_ctx *vc)
 {
 	struct virtio_scsi_cmd_resp __user *resp;
 	struct virtio_scsi_cmd_resp rsp;
@@ -801,10 +802,10 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 
 	memset(&rsp, 0, sizeof(rsp));
 	rsp.response = VIRTIO_SCSI_S_BAD_TARGET;
-	resp = vq->iov[out].iov_base;
+	resp = vq->iov[vc->out].iov_base;
 	ret = __copy_to_user(resp, &rsp, sizeof(rsp));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_cmd_resp\n");
 }
@@ -813,21 +814,17 @@ static int
 vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 		    struct vhost_scsi_ctx *vc)
 {
-	int ret = -ENXIO;
+	int ret;
 
-	vc->head = vhost_get_vq_desc(vq, vq->iov,
-				     ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
-				     NULL, NULL);
+	ret = vhost_get_vq_desc(vq, &vc->head, vq->iov,
+				ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
+				NULL, NULL);
 
 	pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
-		 vc->head, vc->out, vc->in);
-
-	/* On error, stop handling until the next kick. */
-	if (unlikely(vc->head < 0))
-		goto done;
+		 vc->head.id, vc->out, vc->in);
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
-	if (vc->head == vq->num) {
+	if (ret == -ENOSPC) {
 		if (unlikely(vhost_enable_notify(vq))) {
 			vhost_disable_notify(vq);
 			ret = -EAGAIN;
@@ -835,6 +832,10 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 		goto done;
 	}
 
+	/* On error, stop handling until the next kick. */
+	if (unlikely(ret < 0))
+		goto done;
+
 	/*
 	 * Get the size of request and response buffers.
 	 * FIXME: Not correct for BIDI operation
@@ -1025,6 +1026,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 					vq_err(vq, "Received non zero pi_bytesin,"
 						" but wrong data_direction\n");
 					goto err;
+
 				}
 				prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin);
 			}
@@ -1097,7 +1099,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		 * complete the virtio-scsi request in TCM callback context via
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
-		cmd->tvc_vq_desc = vc.head;
+		cmd->tvc_vq_used = vc.head;
 		/*
 		 * Dispatch cmd descriptor for cmwq execution in process
 		 * context provided by vhost_scsi_workqueue.  This also ensures
@@ -1117,8 +1119,10 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc);
+
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
+
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -1140,7 +1144,7 @@ vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
 }
@@ -1162,7 +1166,7 @@ vhost_scsi_send_an_resp(struct vhost_scsi *vs,
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_an_resp\n");
 }
@@ -1269,8 +1273,10 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc);
+
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
+
 out:
 	mutex_unlock(&vq->mutex);
 }
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 50ba382f0981..dbe4db0179a5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2651,6 +2651,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error. */
 int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct vring_used_elem *used,
 		      struct iovec iov[], unsigned int iov_size,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num)
@@ -2683,7 +2684,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		 * invalid.
 		 */
 		if (vq->avail_idx == last_avail_idx)
-			return vq->num;
+			return -ENOSPC;
 
 		/* Only get avail ring entries after they have been
 		 * exposed by guest.
@@ -2700,6 +2701,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		return -EFAULT;
 	}
 
+	used->id = ring_head;
 	head = vhost16_to_cpu(vq, ring_head);
 
 	/* If their number is silly, that's an error. */
@@ -2787,10 +2789,17 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	/* Assume notifications from guest are disabled at this point,
 	 * if they aren't we would need to update avail_event index. */
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
-	return head;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static void vhost_set_used_len(struct vhost_virtqueue *vq,
+			       struct vring_used_elem *used, int len)
+{
+	used->len = cpu_to_vhost32(vq, len);
+}
+EXPORT_SYMBOL_GPL(vhost_set_used_len);
+
 static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
 	BUG_ON(count > vq->nheads);
@@ -2860,17 +2869,17 @@ int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
 EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
 void vhost_set_zc_used(struct vhost_virtqueue *vq,
-			   int idx, unsigned int head, int len)
+		       int idx, struct vring_used_elem *elem, int len)
 {
-	vq->heads[idx].id = head;
-	vq->heads[idx].len = len;
+	vq->heads[idx] = *elem;
+	vhost_set_zc_used_len(vq, idx, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   unsigned int head, int len)
+			   struct vring_used_elem *elem, int len)
 {
-	vhost_set_zc_used(vq, vq->nheads, head, len);
+	vhost_set_zc_used(vq, vq->nheads, elem, len);
 	++vq->nheads;
 }
 EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
@@ -2921,14 +2930,11 @@ EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+		   int len)
 {
-	struct vring_used_elem heads = {
-		cpu_to_vhost32(vq, head),
-		cpu_to_vhost32(vq, len)
-	};
-
-	return vhost_add_used_n(vq, &heads, 1);
+	vhost_set_used_len(vq, used, len);
+	return vhost_add_used_n(vq, used, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
@@ -2981,9 +2987,9 @@ EXPORT_SYMBOL_GPL(vhost_signal);
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
 			       struct vhost_virtqueue *vq,
-			       unsigned int head, int len)
+			       struct vring_used_elem *used, int len)
 {
-	vhost_add_used(vq, head, len);
+	vhost_add_used(vq, used, len);
 	vhost_signal(dev, vq);
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 481baba20c3d..f835eefa240c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -233,16 +233,17 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
 int vhost_get_vq_desc(struct vhost_virtqueue *,
+		      struct vring_used_elem *used_elem,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
+int vhost_add_used(struct vhost_virtqueue *, struct vring_used_elem *, int);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       unsigned int id, int len);
+			       struct vring_used_elem *, int);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
@@ -250,11 +251,11 @@ void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
 void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       unsigned int head, int len);
+		       struct vring_used_elem *elem, int len);
 
 /* Non zerocopy shadow used ring API */
-void vhost_add_shadow_used(struct vhost_virtqueue *vq, unsigned int head,
-			   int len);
+void vhost_add_shadow_used(struct vhost_virtqueue *vq,
+			   struct vring_used_elem *elem, int len);
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f94021b450f0..1c962bfdc3a1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -100,11 +100,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 
 	do {
 		struct virtio_vsock_pkt *pkt;
+		struct vring_used_elem used;
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
 		size_t len;
-		int head;
+		int ret;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
 		if (list_empty(&vsock->send_pkt_list)) {
@@ -118,16 +119,9 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		list_del_init(&pkt->list);
 		spin_unlock_bh(&vsock->send_pkt_list_lock);
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
-			break;
-		}
-
-		if (head == vq->num) {
+		ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+					&out, &in, NULL, NULL);
+		if (ret == -ENOSPC) {
 			spin_lock_bh(&vsock->send_pkt_list_lock);
 			list_add(&pkt->list, &vsock->send_pkt_list);
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
@@ -141,6 +135,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			}
 			break;
 		}
+		if (ret < 0) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			break;
+		}
 
 		if (out) {
 			virtio_transport_free_pkt(pkt);
@@ -148,7 +148,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
+		len = vhost32_to_cpu(vq, used.len);
 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
@@ -165,7 +165,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, &used, sizeof(pkt->hdr) + pkt->len);
 		added = true;
 
 		if (pkt->reply) {
@@ -360,7 +360,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 						 dev);
 	struct virtio_vsock_pkt *pkt;
-	int head, pkts = 0, total_len = 0;
+	struct vring_used_elem used;
+	int ret, pkts = 0, total_len = 0;
 	unsigned int out, in;
 	bool added = false;
 
@@ -381,18 +382,17 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			goto no_more_replies;
 		}
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0)
-			break;
-
-		if (head == vq->num) {
+		ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+					&out, &in, NULL, NULL);
+		if (ret == -ENOSPC) {
 			if (unlikely(vhost_enable_notify(vq))) {
 				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
 		}
+		if (ret < 0)
+			break;
 
 		pkt = vhost_vsock_alloc_pkt(vq, out, in);
 		if (!pkt) {
@@ -411,8 +411,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 		else
 			virtio_transport_free_pkt(pkt);
 
-		len += sizeof(pkt->hdr);
-		vhost_add_used(vq, head, len);
+		vhost_add_used(vq, &used, sizeof(pkt->hdr) + len);
 		total_len += len;
 		added = true;
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));

From patchwork Wed Jul 17 10:52:51 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047639
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4078F13A4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:06 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E95028734
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:06 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 227D728736; Wed, 17 Jul 2019 10:54:06 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C3682873A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732806AbfGQKx7 (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:53:59 -0400
Received: from mx1.redhat.com ([209.132.183.28]:45078 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732701AbfGQKx6 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:53:58 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 31DF886674;
        Wed, 17 Jul 2019 10:53:58 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 435DC1001DC0;
        Wed, 17 Jul 2019 10:53:55 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 11/15] vhost: do not use vring_used_elem
Date: Wed, 17 Jul 2019 06:52:51 -0400
Message-Id: <20190717105255.63488-12-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]);
 Wed, 17 Jul 2019 10:53:58 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Instead of depending on the exported vring_used_elem, this patch
switches to use a new internal structure vhost_used_elem which embed
vring_used_elem in itself. This could be used to let vhost to record
extra metadata for the incoming packed ring layout.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 10 +++++-----
 drivers/vhost/scsi.c  |  8 ++++----
 drivers/vhost/vhost.c | 38 +++++++++++++++++++++++---------------
 drivers/vhost/vhost.h | 21 +++++++++++++++------
 drivers/vhost/vsock.c |  4 ++--
 5 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 572d80c8c36e..7c2f320930c7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -546,7 +546,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-				    struct vring_used_elem *used_elem,
+				    struct vhost_used_elem *used_elem,
 				    unsigned int *out_num, unsigned int *in_num,
 				    struct msghdr *msghdr, bool *busyloop_intr)
 {
@@ -596,7 +596,7 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
 }
 
 static int get_tx_bufs(struct vhost_net *net,
-		       struct vring_used_elem *used_elem,
+		       struct vhost_used_elem *used_elem,
 		       struct vhost_net_virtqueue *nvq,
 		       struct msghdr *msg,
 		       unsigned int *out, unsigned int *in,
@@ -752,7 +752,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	unsigned out, in;
 	struct msghdr msg = {
 		.msg_name = NULL,
@@ -847,7 +847,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		.msg_flags = MSG_DONTWAIT,
 	};
 	struct tun_msg_ctl ctl;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -1027,7 +1027,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 	 * datalen > 0.
 	 */
 	u32 uninitialized_var(len);
-	struct vring_used_elem uninitialized_var(used);
+	struct vhost_used_elem uninitialized_var(used);
 
 	while (datalen > 0 && headcount < quota) {
 		if (unlikely(seg >= UIO_MAXIOV)) {
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4a5a75ab25ad..42c32612dc32 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -72,7 +72,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
 	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-	struct vring_used_elem tvc_vq_used;
+	struct vhost_used_elem tvc_vq_used;
 	/* virtio-scsi initiator task attribute */
 	int tvc_task_attr;
 	/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-	struct vring_used_elem head;
+	struct vhost_used_elem head;
 	unsigned int out, in;
 	size_t req_size, rsp_size;
 	size_t out_size, in_size;
@@ -449,7 +449,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	struct virtio_scsi_event *event = &evt->event;
 	struct virtio_scsi_event __user *eventp;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	unsigned out, in;
 	int ret;
 
@@ -821,7 +821,7 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 				NULL, NULL);
 
 	pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
-		 vc->head.id, vc->out, vc->in);
+		 vc->head.elem.id, vc->out, vc->in);
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
 	if (ret == -ENOSPC) {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dbe4db0179a5..6044cdea124f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2651,7 +2651,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error. */
 int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct vring_used_elem *used,
+		      struct vhost_used_elem *used,
 		      struct iovec iov[], unsigned int iov_size,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num)
@@ -2701,7 +2701,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		return -EFAULT;
 	}
 
-	used->id = ring_head;
+	used->elem.id = ring_head;
 	head = vhost16_to_cpu(vq, ring_head);
 
 	/* If their number is silly, that's an error. */
@@ -2793,13 +2793,20 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-static void vhost_set_used_len(struct vhost_virtqueue *vq,
-			       struct vring_used_elem *used, int len)
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+			struct vhost_used_elem *used, int len)
 {
-	used->len = cpu_to_vhost32(vq, len);
+	used->elem.len = cpu_to_vhost32(vq, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_used_len);
 
+__virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
+			      struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->elem.len);
+}
+EXPORT_SYMBOL_GPL(vhost_get_used_len);
+
 static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
 	BUG_ON(count > vq->nheads);
@@ -2824,9 +2831,10 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vring_used_elem *heads,
+			    struct vhost_used_elem *shadow,
 			    unsigned count)
 {
+	struct vring_used_elem *heads = (struct vring_used_elem *)shadow;
 	struct vring_used_elem __user *used;
 	u16 old, new;
 	int start;
@@ -2858,18 +2866,18 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 			       int idx, int len)
 {
-	vq->heads[idx].len = len;
+	vq->heads[idx].elem.len = len;
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
 
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
 {
-	return vq->heads[idx].len;
+	return vq->heads[idx].elem.len;
 }
 EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
-void vhost_set_zc_used(struct vhost_virtqueue *vq,
-		       int idx, struct vring_used_elem *elem, int len)
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       struct vhost_used_elem *elem, int len)
 {
 	vq->heads[idx] = *elem;
 	vhost_set_zc_used_len(vq, idx, len);
@@ -2877,7 +2885,7 @@ void vhost_set_zc_used(struct vhost_virtqueue *vq,
 EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vring_used_elem *elem, int len)
+			   struct vhost_used_elem *elem, int len)
 {
 	vhost_set_zc_used(vq, vq->nheads, elem, len);
 	++vq->nheads;
@@ -2893,7 +2901,7 @@ EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
 static int vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vring_used_elem *heads,
+			    struct vhost_used_elem *heads,
 			    unsigned count)
 {
 	int start, n, r;
@@ -2930,7 +2938,7 @@ EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+int vhost_add_used(struct vhost_virtqueue *vq, struct vhost_used_elem *used,
 		   int len)
 {
 	vhost_set_used_len(vq, used, len);
@@ -2987,7 +2995,7 @@ EXPORT_SYMBOL_GPL(vhost_signal);
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
 			       struct vhost_virtqueue *vq,
-			       struct vring_used_elem *used, int len)
+			       struct vhost_used_elem *used, int len)
 {
 	vhost_add_used(vq, used, len);
 	vhost_signal(dev, vq);
@@ -2997,7 +3005,7 @@ EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
 /* multi-buffer version of vhost_add_used_and_signal */
 static void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 					struct vhost_virtqueue *vq,
-					struct vring_used_elem *heads,
+					struct vhost_used_elem *heads,
 					unsigned count)
 {
 	vhost_add_used_n(vq, heads, count);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f835eefa240c..b8a5d1a2bed9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,10 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+struct vhost_used_elem {
+	struct vring_used_elem elem;
+};
+
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 bool vhost_has_work(struct vhost_dev *dev);
@@ -162,7 +166,7 @@ struct vhost_virtqueue {
 	struct iovec iov[UIO_MAXIOV];
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
-	struct vring_used_elem *heads;
+	struct vhost_used_elem *heads;
 	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
@@ -233,17 +237,22 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
 int vhost_get_vq_desc(struct vhost_virtqueue *,
-		      struct vring_used_elem *used_elem,
+		      struct vhost_used_elem *used_elem,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, struct vring_used_elem *, int);
+int vhost_add_used(struct vhost_virtqueue *, struct vhost_used_elem *, int);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       struct vring_used_elem *, int);
+			       struct vhost_used_elem *, int);
+
+__virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
+			      struct vhost_used_elem *used);
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+			struct vhost_used_elem *used, int len);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
@@ -251,11 +260,11 @@ void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
 void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       struct vring_used_elem *elem, int len);
+		       struct vhost_used_elem *elem, int len);
 
 /* Non zerocopy shadow used ring API */
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vring_used_elem *elem, int len);
+			   struct vhost_used_elem *elem, int len);
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 1c962bfdc3a1..a33e194499cf 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -100,7 +100,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 
 	do {
 		struct virtio_vsock_pkt *pkt;
-		struct vring_used_elem used;
+		struct vhost_used_elem used;
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
@@ -148,7 +148,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = vhost32_to_cpu(vq, used.len);
+		len = vhost_get_used_len(&used);
 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);

From patchwork Wed Jul 17 10:52:52 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047649
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9499C912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:40 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 831F82872A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:40 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7687228739; Wed, 17 Jul 2019 10:54:40 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21C3328736
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1733109AbfGQKyL (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:54:11 -0400
Received: from mx1.redhat.com ([209.132.183.28]:34014 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1733001AbfGQKyL (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:54:11 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 126A281F1B;
        Wed, 17 Jul 2019 10:54:11 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id DFA071017E30;
        Wed, 17 Jul 2019 10:53:58 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 12/15] vhost: vhost_put_user() can accept metadata type
Date: Wed, 17 Jul 2019 06:52:52 -0400
Message-Id: <20190717105255.63488-13-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]);
 Wed, 17 Jul 2019 10:54:11 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We assumes used ring update is the only user for vhost_put_user() in
the past. This may not be the case for the incoming packed ring which
may update the descriptor ring for used. So introduce a new type
parameter.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6044cdea124f..3fa1adf2cb90 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1178,7 +1178,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	return __vhost_get_user_slow(vq, addr, size, type);
 }
 
-#define vhost_put_user(vq, x, ptr)		\
+#define vhost_put_user(vq, x, ptr, type)		\
 ({ \
 	int ret = -EFAULT; \
 	if (!vq->iotlb) { \
@@ -1186,7 +1186,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	} else { \
 		__typeof__(ptr) to = \
 			(__typeof__(ptr)) __vhost_get_user(vq, ptr,	\
-					  sizeof(*ptr), VHOST_ADDR_USED); \
+					  sizeof(*ptr), type); \
 		if (to != NULL) \
 			ret = __put_user(x, to); \
 		else \
@@ -1230,7 +1230,7 @@ static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
-			      vhost_avail_event(vq));
+			      vhost_avail_event(vq), VHOST_ADDR_USED);
 }
 
 static inline int vhost_put_used(struct vhost_virtqueue *vq,
@@ -1267,7 +1267,7 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
-			      &vq->used->flags);
+			      &vq->used->flags, VHOST_ADDR_USED);
 }
 
 static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
@@ -1284,7 +1284,7 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
-			      &vq->used->idx);
+			      &vq->used->idx, VHOST_ADDR_USED);
 }
 
 #define vhost_get_user(vq, x, ptr, type)		\

From patchwork Wed Jul 17 10:52:53 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047645
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14B15912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 003F228735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:33 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id E7EE32873B; Wed, 17 Jul 2019 10:54:33 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8727D28735
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2387462AbfGQKyQ (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:54:16 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58318 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1733198AbfGQKyP (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:54:15 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id B70EB30842A0;
        Wed, 17 Jul 2019 10:54:14 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 8B2E110246E1;
        Wed, 17 Jul 2019 10:54:11 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 13/15] vhost: packed ring support
Date: Wed, 17 Jul 2019 06:52:53 -0400
Message-Id: <20190717105255.63488-14-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]);
 Wed, 17 Jul 2019 10:54:14 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch introduces basic support for packed ring. The idea behinds
packed ring is to use a single descriptor ring instead of three
different rings (avail, used and descriptor). This could help to
reduce the cache contention and PCI transactions. So it was designed
to help for the performance for both software implementation and
hardware implementation.

The implementation was straightforward, packed version of vhost core
(whose name has a packed suffix) helpers were introduced and previous
helpers were renamed with a split suffix. Then the exported helpers
can just do a switch to go to the correct internal helpers.

The event suppression (device area and driver area) were not
implemented. It will be done on top with another patch.

For more information of packed ring, please refer Virtio spec.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |   6 +-
 drivers/vhost/vhost.c | 980 ++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h |  24 +-
 3 files changed, 925 insertions(+), 85 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7c2f320930c7..ef79446b42f1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -799,7 +799,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 				goto done;
 			} else if (unlikely(err != -ENOSPC)) {
 				vhost_tx_batch(net, nvq, sock, &msg);
-				vhost_discard_vq_desc(vq, 1);
+				vhost_discard_vq_desc(vq, &used);
 				vhost_net_enable_vq(net, vq);
 				break;
 			}
@@ -820,7 +820,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_vq_desc(vq, &used);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
@@ -919,7 +919,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 				vhost_net_ubuf_put(ubufs);
 			nvq->upend_idx = ((unsigned int)nvq->upend_idx - 1)
 					 % UIO_MAXIOV;
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_vq_desc(vq, &used);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3fa1adf2cb90..a7d24b9d5204 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -479,6 +479,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
+	vq->last_used_wrap_counter = true;
+	vq->last_avail_wrap_counter = true;
+	vq->avail_wrap_counter = true;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
 	vq->invalidate_count = 0;
@@ -551,7 +554,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 					     GFP_KERNEL);
 		vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log),
 					GFP_KERNEL);
-		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
+		vq->heads = kmalloc_array(dev->iov_limit,
+					  sizeof(struct vhost_used_elem),
 					  GFP_KERNEL);
 		if (!vq->indirect || !vq->log || !vq->heads)
 			goto err_nomem;
@@ -1406,8 +1410,8 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
 	return vhost_get_used(vq, *idx, &vq->used->idx);
 }
 
-static inline int vhost_get_desc(struct vhost_virtqueue *vq,
-				 struct vring_desc *desc, int idx)
+static inline int vhost_get_desc_split(struct vhost_virtqueue *vq,
+				       struct vring_desc *desc, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	struct vring_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
@@ -1422,6 +1426,104 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 	return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc));
 }
 
+static inline int vhost_get_desc_packed(struct vhost_virtqueue *vq,
+					struct vring_packed_desc *desc, int idx)
+{
+	int ret;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+
+		desc->flags = d->flags;
+
+		/* Make sure flags is seen before the rest fields of
+		 * descriptor.
+		 */
+		smp_rmb();
+
+		desc->addr = d->addr;
+		desc->len = d->len;
+		desc->id = d->id;
+
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	ret = vhost_copy_from_user(vq, &desc->flags,
+				   &(vq->desc_packed + idx)->flags,
+				   sizeof(desc->flags));
+	if (unlikely(ret))
+		return ret;
+
+	/* Make sure flags is seen before the rest fields of
+	 * descriptor.
+	 */
+	smp_rmb();
+
+	return vhost_copy_from_user(vq, desc, vq->desc_packed + idx,
+				    offsetof(struct vring_packed_desc, flags));
+}
+
+static inline int vhost_put_desc_packed(struct vhost_virtqueue *vq,
+					struct vring_packed_desc *desc, int idx)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		d->id = desc->id;
+		d->flags = desc->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	return vhost_copy_to_user(vq, vq->desc_packed + idx, desc,
+				  sizeof(*desc));
+}
+
+static inline int vhost_get_desc_flags(struct vhost_virtqueue *vq,
+				       __virtio16 *flags, int idx)
+{
+	struct vring_packed_desc __user *desc;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		*flags = d->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	desc = vq->desc_packed + idx;
+	return vhost_get_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
+}
+
+static inline int vhost_put_desc_flags(struct vhost_virtqueue *vq,
+				       __virtio16 *flags, int idx)
+{
+	struct vring_packed_desc __user *desc;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		d->flags = *flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	desc = vq->desc_packed + idx;
+	return vhost_put_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -1701,17 +1803,44 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
 	return 0;
 }
 
-static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-			 struct vring_desc __user *desc,
-			 struct vring_avail __user *avail,
-			 struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+			       struct vring_desc __user *desc,
+			       struct vring_avail __user *avail,
+			       struct vring_used __user *used)
+{
+	struct vring_packed_desc *packed = (struct vring_packed_desc *)desc;
+
+	/* TODO: check device area and driver area */
+	return access_ok(packed, num * sizeof(*packed)) &&
+	       access_ok(packed, num * sizeof(*packed));
+}
 
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+			      struct vring_desc __user *desc,
+			      struct vring_avail __user *avail,
+			      struct vring_used __user *used)
 {
 	return access_ok(desc, vhost_get_desc_size(vq, num)) &&
 	       access_ok(avail, vhost_get_avail_size(vq, num)) &&
 	       access_ok(used, vhost_get_used_size(vq, num));
 }
 
+#define VHOST_ALTERNATIVE(func, vq, ...)		   \
+do {							   \
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))   \
+		return func##_##packed(vq, ##__VA_ARGS__); \
+	else						   \
+		return func##_##split(vq, ##__VA_ARGS__);  \
+} while (0)
+
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+			struct vring_desc __user *desc,
+			struct vring_avail __user *avail,
+			struct vring_used __user *used)
+{
+	VHOST_ALTERNATIVE(vq_access_ok, vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 				 const struct vhost_umem_node *node,
 				 int type)
@@ -1786,13 +1915,18 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq)
 		return 1;
 	}
 
-	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
-			       vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
+	return iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->desc,
+			       vhost_get_desc_size(vq, num),
+			       VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->avail,
 			       vhost_get_avail_size(vq, num),
 			       VHOST_ADDR_AVAIL) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
-			       vhost_get_used_size(vq, num), VHOST_ADDR_USED);
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
+			       (u64)(uintptr_t)vq->used,
+			       vhost_get_used_size(vq, num),
+			       VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_meta_prefetch);
 
@@ -2067,18 +2201,33 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 			r = -EFAULT;
 			break;
 		}
-		if (s.num > 0xffff) {
-			r = -EINVAL;
-			break;
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+			vq->last_used_wrap_counter = s.num & (1 << 31);
+			vq->last_used_idx = s.num >> 16 & ~(1 << 15);
+			s.num >>= 16;
+			vq->last_avail_wrap_counter = s.num & (1 << 15);
+			vq->last_avail_idx = s.num & ~(1 << 15);
+		} else {
+			if (s.num > 0xffff) {
+				r = -EINVAL;
+				break;
+			}
+			vq->last_avail_idx = s.num;
 		}
-		vq->last_avail_idx = s.num;
 		/* Forget the cached index value. */
 		vq->avail_idx = vq->last_avail_idx;
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
 		break;
 	case VHOST_GET_VRING_BASE:
 		s.index = idx;
 		s.num = vq->last_avail_idx;
-		if (copy_to_user(argp, &s, sizeof s))
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+			s.num |= vq->last_avail_wrap_counter << 15;
+			s.num |= vq->last_used_idx << 16;
+			s.num |= vq->last_used_wrap_counter << 31;
+		}
+		if (copy_to_user(argp, &s, sizeof(s)))
 			r = -EFAULT;
 		break;
 	case VHOST_SET_VRING_KICK:
@@ -2375,6 +2524,48 @@ static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
 	return 0;
 }
 
+static int __log_write(struct vhost_virtqueue *vq, void *addr, int size)
+{
+	struct iovec iov[64];
+	int ret, i;
+
+	if (!size)
+		return 0;
+
+	if (!vq->iotlb)
+		return log_write_hva(vq, (uintptr_t)addr, size);
+
+	ret = translate_desc(vq, (uintptr_t)addr, size, iov, 64,
+			     VHOST_ACCESS_WO);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < ret; i++) {
+		ret = log_write_hva(vq,	(uintptr_t)iov[i].iov_base,
+				    iov[i].iov_len);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int log_desc(struct vhost_virtqueue *vq, unsigned int idx,
+		    unsigned int count)
+{
+	int ret, c = min(count, vq->num - idx);
+
+	ret = __log_write(vq, vq->desc_packed + idx,
+			  c * sizeof(*vq->desc_packed));
+	if (ret < 0)
+		return ret;
+
+	ret = __log_write(vq, vq->desc_packed,
+			  (count - c) * sizeof(*vq->desc_packed));
+
+	return ret;
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
@@ -2458,6 +2649,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
 	vhost_init_is_le(vq);
 
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return 0;
+
 	r = vhost_update_used_flags(vq);
 	if (r)
 		goto err;
@@ -2531,7 +2725,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain,
  * or -1U if we're at the end. */
-static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
+static unsigned next_desc_split(struct vhost_virtqueue *vq,
+				struct vring_desc *desc)
 {
 	unsigned int next;
 
@@ -2544,11 +2739,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
 	return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-			struct iovec iov[], unsigned int iov_size,
-			unsigned int *out_num, unsigned int *in_num,
-			struct vhost_log *log, unsigned int *log_num,
-			struct vring_desc *indirect)
+static unsigned int next_desc_packed(struct vhost_virtqueue *vq,
+				     struct vring_packed_desc *desc)
+{
+	return desc->flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT);
+}
+
+static int get_indirect_split(struct vhost_virtqueue *vq,
+			      struct iovec iov[], unsigned int iov_size,
+			      unsigned int *out_num, unsigned int *in_num,
+			      struct vhost_log *log, unsigned int *log_num,
+			      struct vring_desc *indirect)
 {
 	struct vring_desc desc;
 	unsigned int i = 0, count, found = 0;
@@ -2638,23 +2839,291 @@ static int get_indirect(struct vhost_virtqueue *vq,
 			}
 			*out_num += ret;
 		}
-	} while ((i = next_desc(vq, &desc)) != -1);
+	} while ((i = next_desc_split(vq, &desc)) != -1);
 	return 0;
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct vhost_used_elem *used,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
+static int get_indirect_packed(struct vhost_virtqueue *vq,
+			       struct iovec iov[], unsigned int iov_size,
+			       unsigned int *out_num, unsigned int *in_num,
+			       struct vhost_log *log, unsigned int *log_num,
+			       struct vring_packed_desc *indirect)
+{
+	struct vring_packed_desc desc;
+	unsigned int i, count, found = 0;
+	u32 len = vhost32_to_cpu(vq, indirect->len);
+	struct iov_iter from;
+	int ret, access;
+
+	/* Sanity check */
+	if (unlikely(len % sizeof(desc))) {
+		vq_err(vq, "Invalid length in indirect descriptor: len 0x%llx not multiple of 0x%zx\n",
+		       (unsigned long long)len,
+		       sizeof(desc));
+		return -EINVAL;
+	}
+
+	ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr),
+			     len, vq->indirect,
+			     UIO_MAXIOV, VHOST_ACCESS_RO);
+	if (unlikely(ret < 0)) {
+		if (ret != -EAGAIN)
+			vq_err(vq, "Translation failure %d in indirect.\n",
+			       ret);
+		return ret;
+	}
+	iov_iter_init(&from, READ, vq->indirect, ret, len);
+
+	/* We will use the result as an address to read from, so most
+	 * architectures only need a compiler barrier here.
+	 */
+	read_barrier_depends();
+
+	count = len / sizeof(desc);
+	/* Buffers are chained via a 16 bit next field, so
+	 * we can have at most 2^16 of these.
+	 */
+	if (unlikely(count > USHRT_MAX + 1)) {
+		vq_err(vq, "Indirect buffer length too big: %d\n",
+		       indirect->len);
+		return -E2BIG;
+	}
+
+	for (i = 0; i < count; i++) {
+		unsigned int iov_count = *in_num + *out_num;
+
+		if (unlikely(++found > count)) {
+			vq_err(vq, "Loop detected: last one at %u indirect size %u\n",
+			       i, count);
+			return -EINVAL;
+		}
+		if (unlikely(!copy_from_iter_full(&desc, sizeof(desc),
+						  &from))) {
+			vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)vhost64_to_cpu(vq, indirect->addr)
+				  + i * sizeof(desc));
+			return -EINVAL;
+		}
+		if (unlikely(desc.flags &
+			     cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT))) {
+			vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)vhost64_to_cpu(vq, indirect->addr)
+				  + i * sizeof(desc));
+			return -EINVAL;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count,
+				     iov_size - iov_count, access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d indirect idx %d\n",
+				       ret, i);
+			return ret;
+		}
+		/* If this is an input descriptor, increment that count. */
+		if (access == VHOST_ACCESS_WO) {
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Indirect descriptor has out after in: idx %d\n",
+				       i);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+	}
+
+	return 0;
+}
+
+#define VRING_DESC_F_AVAIL (1 << VRING_PACKED_DESC_F_AVAIL)
+#define VRING_DESC_F_USED  (1 << VRING_PACKED_DESC_F_USED)
+
+static bool desc_is_avail(struct vhost_virtqueue *vq, bool wrap_counter,
+			  __virtio16 flags)
+{
+	bool avail = flags & cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+
+	return avail == wrap_counter;
+}
+
+static __virtio16 get_desc_flags(struct vhost_virtqueue *vq,
+				 bool wrap_counter, bool write)
+{
+	__virtio16 flags = 0;
+
+	if (wrap_counter) {
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_USED);
+	} else {
+		flags &= ~cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+		flags &= ~cpu_to_vhost16(vq, VRING_DESC_F_USED);
+	}
+
+	if (write)
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_WRITE);
+
+	return flags;
+}
+
+static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
+					  bool wrap, __u16 off_wrap, __u16 new,
+					  __u16 old)
+{
+	int off = off_wrap & ~(1 << 15);
+
+	if (wrap != off_wrap >> 15)
+		off -= vq->num;
+
+	return vring_need_event(off, new, old);
+}
+
+static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num,
+				    struct vhost_log *log,
+				    unsigned int *log_num)
+{
+	struct vring_packed_desc desc;
+	struct vring_packed_used_elem *elem = &used->packed_elem;
+	int ret, access;
+	u16 last_avail_idx = vq->last_avail_idx;
+	u16 off_wrap = vq->avail_idx | (vq->avail_wrap_counter << 15);
+
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	elem->count = 0;
+
+	do {
+		unsigned int iov_count = *in_num + *out_num;
+
+		ret = vhost_get_desc_packed(vq, &desc, vq->last_avail_idx);
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get flags: idx %d\n",
+			       vq->last_avail_idx);
+			return -EFAULT;
+		}
+
+		if (!desc_is_avail(vq, vq->last_avail_wrap_counter,
+				   desc.flags)) {
+			/* If there's nothing new since last we looked, return
+			 * invalid.
+			 */
+			if (!elem->count)
+				return -ENOSPC;
+			vq_err(vq, "Unexpected unavail descriptor: idx %d\n",
+			       vq->last_avail_idx);
+			return -EFAULT;
+		}
+
+		elem->id = desc.id;
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
+			ret = get_indirect_packed(vq, iov, iov_size,
+						  out_num, in_num, log,
+						  log_num, &desc);
+			if (unlikely(ret < 0)) {
+				if (ret != -EAGAIN)
+					vq_err(vq, "Failure detected in indirect descriptor at idx %d\n",
+					       desc.id);
+				return ret;
+			}
+			goto next;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count, iov_size - iov_count,
+				     access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d idx %d\n",
+				       ret, desc.id);
+			return ret;
+		}
+
+		if (access == VHOST_ACCESS_WO) {
+			/* If this is an input descriptor,
+			 * increment that count.
+			 */
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Desc out after in: idx %d\n",
+				       desc.id);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+
+next:
+		if (unlikely(++elem->count > vq->num)) {
+			vq_err(vq, "Loop detected: last one at %u vq size %u head %u\n",
+			       desc.id, vq->num, elem->id);
+			return -EINVAL;
+		}
+		if (unlikely(++vq->last_avail_idx >= vq->num)) {
+			vq->last_avail_idx = 0;
+			vq->last_avail_wrap_counter ^= 1;
+		}
+	/* If this descriptor says it doesn't chain, we're done. */
+	} while (next_desc_packed(vq, &desc));
+
+	/* Packed ring does not have avail idx which means we need to
+	 * track it by our own. The check here is to make sure it
+	 * grows monotonically.
+	 */
+	if (likely(vhost_vring_packed_need_event(vq,
+						 vq->last_avail_wrap_counter,
+						 off_wrap, vq->last_avail_idx,
+						 last_avail_idx))) {
+		vq->avail_idx = vq->last_avail_idx;
+		vq->avail_wrap_counter = vq->last_avail_wrap_counter;
+	}
+
+	return 0;
+}
+
+static int vhost_get_vq_desc_split(struct vhost_virtqueue *vq,
+				   struct vhost_used_elem *used,
+				   struct iovec iov[], unsigned int iov_size,
+				   unsigned int *out_num, unsigned int *in_num,
+				   struct vhost_log *log, unsigned int *log_num)
 {
 	struct vring_desc desc;
 	unsigned int i, head, found = 0;
@@ -2730,16 +3199,16 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 			       i, vq->num, head);
 			return -EINVAL;
 		}
-		ret = vhost_get_desc(vq, &desc, i);
+		ret = vhost_get_desc_split(vq, &desc, i);
 		if (unlikely(ret)) {
 			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
 			       i, vq->desc + i);
 			return -EFAULT;
 		}
 		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
-			ret = get_indirect(vq, iov, iov_size,
-					   out_num, in_num,
-					   log, log_num, &desc);
+			ret = get_indirect_split(vq, iov, iov_size,
+						 out_num, in_num,
+						 log, log_num, &desc);
 			if (unlikely(ret < 0)) {
 				if (ret != -EAGAIN)
 					vq_err(vq, "Failure detected "
@@ -2781,7 +3250,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 			}
 			*out_num += ret;
 		}
-	} while ((i = next_desc(vq, &desc)) != -1);
+	} while ((i = next_desc_split(vq, &desc)) != -1);
 
 	/* On success, increment avail index. */
 	vq->last_avail_idx++;
@@ -2791,50 +3260,134 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
 	return 0;
 }
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error.
+ */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct vhost_used_elem *used,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	VHOST_ALTERNATIVE(vhost_get_vq_desc, vq, used, iov,
+			  iov_size, out_num, in_num, log, log_num);
+}
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static void vhost_set_used_len_split(struct vhost_virtqueue *vq,
+				     struct vhost_used_elem *used, int len)
+{
+	struct vring_used_elem *elem = &used->elem;
+
+	elem->len = cpu_to_vhost32(vq, len);
+}
+
+static void vhost_set_used_len_packed(struct vhost_virtqueue *vq,
+				      struct vhost_used_elem *used, int len)
+{
+	struct vring_packed_used_elem *elem = &used->packed_elem;
+
+	elem->len = cpu_to_vhost32(vq, len);
+}
+
 void vhost_set_used_len(struct vhost_virtqueue *vq,
 			struct vhost_used_elem *used, int len)
 {
-	used->elem.len = cpu_to_vhost32(vq, len);
+	VHOST_ALTERNATIVE(vhost_set_used_len, vq, used, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_used_len);
 
+__virtio32 vhost_get_used_len_split(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->elem.len);
+}
+
+__virtio32 vhost_get_used_len_packed(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->packed_elem.len);
+}
+
 __virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
 			      struct vhost_used_elem *used)
 {
-	return vhost32_to_cpu(vq, used->elem.len);
+	VHOST_ALTERNATIVE(vhost_get_used_len, vq, used);
 }
 EXPORT_SYMBOL_GPL(vhost_get_used_len);
 
-static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
+static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq,
+				       int count)
 {
 	BUG_ON(count > vq->nheads);
 	vq->nheads -= count;
 }
 
+static void vhost_discard_vq_desc_packed(struct vhost_virtqueue *vq,
+					 struct vring_packed_used_elem *elem,
+					 int headcount)
+{
+	int i;
+
+	for (i = 0; i < headcount; i++) {
+		vq->last_avail_idx -= elem[i].count;
+		if (vq->last_avail_idx >= vq->num) {
+			vq->last_avail_wrap_counter ^= 1;
+			vq->last_avail_idx += vq->num;
+		}
+	}
+}
+
+static void vhost_discard_vq_desc_split(struct vhost_virtqueue *vq, int n)
+{
+	vq->last_avail_idx -= n;
+}
+
+static void vhost_discard_shadow_used_packed(struct vhost_virtqueue *vq, int n)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	vhost_discard_vq_desc_packed(vq, elem + vq->nheads - n, n);
+	vhost_withdraw_shadow_used(vq, n);
+}
+
+static void vhost_discard_shadow_used_split(struct vhost_virtqueue *vq, int n)
+{
+	vhost_withdraw_shadow_used(vq, n);
+	vhost_discard_vq_desc_split(vq, n);
+}
+
 /* Reverse the effect of vhost_get_vq_desc and
  * vhost_add_shadow_used. Useful for error handleing
  */
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n)
 {
-	vhost_withdraw_shadow_used(vq, n);
-	vhost_discard_vq_desc(vq, n);
+	VHOST_ALTERNATIVE(vhost_discard_shadow_used, vq, n);
 }
 EXPORT_SYMBOL_GPL(vhost_discard_shadow_used);
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq,
+			   struct vhost_used_elem *used)
 {
-	vq->last_avail_idx -= n;
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		vhost_discard_vq_desc_packed(vq, &used->packed_elem, 1);
+	else
+		vhost_discard_vq_desc_split(vq, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-static int __vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vhost_used_elem *shadow,
-			    unsigned count)
+static int __vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				    struct vring_used_elem *heads,
+				    unsigned count)
 {
-	struct vring_used_elem *heads = (struct vring_used_elem *)shadow;
 	struct vring_used_elem __user *used;
 	u16 old, new;
 	int start;
@@ -2863,59 +3416,241 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
+static void vhost_set_zc_used_len_split(struct vhost_virtqueue *vq,
+					int idx, int len)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[idx].len = len;
+}
+
+static void vhost_set_zc_used_len_packed(struct vhost_virtqueue *vq,
+					 int idx, int len)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[idx].len = len;
+}
+
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
-			       int idx, int len)
+			   int idx, int len)
 {
-	vq->heads[idx].elem.len = len;
+	VHOST_ALTERNATIVE(vhost_set_zc_used_len, vq, idx, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
 
-int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+void vhost_add_shadow_used_split(struct vhost_virtqueue *vq,
+				 struct vhost_used_elem *used, int len)
 {
-	return vq->heads[idx].elem.len;
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[vq->nheads].id = used->elem.id;
+	heads[vq->nheads].len = len;
+	++vq->nheads;
 }
-EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
-void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       struct vhost_used_elem *elem, int len)
+void vhost_add_shadow_used_packed(struct vhost_virtqueue *vq,
+				  struct vhost_used_elem *used, int len)
 {
-	vq->heads[idx] = *elem;
-	vhost_set_zc_used_len(vq, idx, len);
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[vq->nheads].id = used->packed_elem.id;
+	elem[vq->nheads].count = used->packed_elem.count;
+	elem[vq->nheads].len = len;
+	++vq->nheads;
 }
-EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vhost_used_elem *elem, int len)
+			   struct vhost_used_elem *used, int len)
 {
-	vhost_set_zc_used(vq, vq->nheads, elem, len);
-	++vq->nheads;
+	VHOST_ALTERNATIVE(vhost_add_shadow_used, vq, used, len);
 }
 EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
 
+int vhost_get_zc_used_len_split(struct vhost_virtqueue *vq, int idx)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	return heads[idx].len;
+}
+
+int vhost_get_zc_used_len_packed(struct vhost_virtqueue *vq, int idx)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	return elem[idx].len;
+}
+
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+{
+	VHOST_ALTERNATIVE(vhost_get_zc_used_len, vq, idx);
+}
+EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
+
+void vhost_set_zc_used_split(struct vhost_virtqueue *vq, int idx,
+			     struct vhost_used_elem *used, int len)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[idx] = used->elem;
+	vhost_set_zc_used_len_split(vq, idx, len);
+}
+
+void vhost_set_zc_used_packed(struct vhost_virtqueue *vq, int idx,
+			      struct vhost_used_elem *used, int len)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[idx].id = used->packed_elem.id;
+	elem[idx].count = used->packed_elem.count;
+	vhost_set_zc_used_len_packed(vq, idx, len);
+}
+
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       struct vhost_used_elem *used, int len)
+{
+	VHOST_ALTERNATIVE(vhost_set_zc_used, vq, idx, used, len);
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used);
+
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq)
 {
 	return vq->nheads;
 }
 EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 
+static int __vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				     struct vring_packed_used_elem *elem,
+				     unsigned int count)
+{
+	int i, ret;
+	__virtio16 head_flags = 0;
+	struct vring_packed_desc desc;
+	unsigned int used_idx = vq->last_used_idx;
+	bool wrap_counter = vq->last_used_wrap_counter;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		used_idx += elem[0].count;
+		if (unlikely(used_idx >= vq->num)) {
+			used_idx -= vq->num;
+			wrap_counter ^= 1;
+		}
+
+		for (i = 1; i < count; i++) {
+			d[used_idx].flags = get_desc_flags(vq, wrap_counter,
+							   elem[i].len);
+			d[used_idx].id = elem[i].id;
+			d[used_idx].len = elem[i].len;
+			used_idx += elem[i].count;
+			if (unlikely(used_idx >= vq->num)) {
+				used_idx -= vq->num;
+				wrap_counter ^= 1;
+			}
+		}
+
+		d[vq->last_used_idx].id = elem[0].id;
+		d[vq->last_used_idx].len = elem[0].len;
+		/* Make sure descriptor id and len is written before
+		 * flags for the first used buffer.
+		 */
+		smp_wmb();
+		head_flags = get_desc_flags(vq,	vq->last_used_wrap_counter,
+					    elem[0].len);
+		d[vq->last_used_idx].flags = head_flags;
+
+		vq->last_used_idx = used_idx;
+		vq->last_used_wrap_counter = wrap_counter;
+
+		vhost_put_meta_ptr();
+
+		return 0;
+	}
+#endif
+
+	/* Update used elems other than first to save unnecessary
+	 * memory barriers.
+	 */
+	for (i = 0; i < count; i++) {
+		__virtio16 flags = get_desc_flags(vq, wrap_counter,
+						  elem[i].len);
+		if (i == 0) {
+			head_flags = flags;
+			desc.flags = ~flags;
+		} else
+			desc.flags = flags;
+
+		desc.id = elem[i].id;
+		desc.len = elem[i].len;
+
+		ret = vhost_put_desc_packed(vq, &desc, used_idx);
+		if (unlikely(ret))
+			return ret;
+
+		used_idx += elem[i].count;
+		if (used_idx >= vq->num) {
+			used_idx -= vq->num;
+			wrap_counter ^= 1;
+		}
+	}
+
+	/* Make sure descriptor id and len is written before
+	 * flags for the first used buffer.
+	 */
+	smp_wmb();
+
+	if (vhost_put_desc_flags(vq, &head_flags, vq->last_used_idx)) {
+		vq_err(vq, "Failed to update flags: idx %d",
+		       vq->last_used_idx);
+		return -EFAULT;
+	}
+
+	vq->last_used_idx = used_idx;
+	vq->last_used_wrap_counter = wrap_counter;
+
+	return 0;
+}
+
+static int vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				   struct vring_packed_used_elem *elem,
+				   unsigned int count)
+{
+	u16 last_used_idx = vq->last_used_idx;
+	int ret;
+
+	ret = __vhost_add_used_n_packed(vq, elem, count);
+	if (unlikely(ret))
+		return ret;
+
+	if (unlikely(vq->log_used)) {
+		/* Make sure used descriptors are seen before log. */
+		smp_wmb();
+		ret = log_desc(vq, last_used_idx, count);
+	}
+
+	return ret;
+}
+
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-static int vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vhost_used_elem *heads,
-			    unsigned count)
+static int vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				  struct vring_used_elem *heads,
+				  unsigned count)
+
 {
 	int start, n, r;
 
 	start = vq->last_used_idx & (vq->num - 1);
 	n = vq->num - start;
 	if (n < count) {
-		r = __vhost_add_used_n(vq, heads, n);
+		r = __vhost_add_used_n_split(vq, heads, n);
 		if (r < 0)
 			return r;
 		heads += n;
 		count -= n;
 	}
-	r = __vhost_add_used_n(vq, heads, count);
+	r = __vhost_add_used_n_split(vq, heads, count);
 
 	/* Make sure buffer is written before we update index. */
 	smp_wmb();
@@ -2934,7 +3669,15 @@ static int vhost_add_used_n(struct vhost_virtqueue *vq,
 	}
 	return r;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_n);
+
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd.
+ */
+static int vhost_add_used_n(struct vhost_virtqueue *vq,
+			    void *heads, unsigned int count)
+{
+	VHOST_ALTERNATIVE(vhost_add_used_n, vq, heads, count);
+}
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
@@ -2951,6 +3694,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
+
+	/* TODO: check driver area */
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return true;
+
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -3023,14 +3771,32 @@ void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_flush_shadow_used_and_signal);
 
+void vhost_flush_zc_used_and_signal_split(struct vhost_virtqueue *vq,
+					  int idx, int n)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	vhost_add_used_n_split(vq, &heads[idx], n);
+	vhost_signal(vq->dev, vq);
+}
+
+void vhost_flush_zc_used_and_signal_packed(struct vhost_virtqueue *vq,
+					   int idx, int n)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	vhost_add_used_n_packed(vq, &elem[idx], n);
+	vhost_signal(vq->dev, vq);
+}
+
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n)
 {
-	vhost_add_used_and_signal_n(vq->dev, vq, &vq->heads[idx], n);
+	VHOST_ALTERNATIVE(vhost_flush_zc_used_and_signal, vq, idx, n);
 }
 EXPORT_SYMBOL_GPL(vhost_flush_zc_used_and_signal);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
+static bool vhost_vq_avail_empty_split(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3045,10 +3811,52 @@ bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 
 	return vq->avail_idx == vq->last_avail_idx;
 }
+
+static bool vhost_vq_avail_empty_packed(struct vhost_virtqueue *vq)
+{
+	__virtio16 flags;
+	int ret;
+
+	ret = vhost_get_desc_flags(vq, &flags, vq->avail_idx);
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get flags: idx %d\n",
+		       vq->avail_idx);
+		return -EFAULT;
+	}
+
+	return !desc_is_avail(vq, vq->avail_wrap_counter, flags);
+}
+
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_vq_avail_empty, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-/* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_virtqueue *vq)
+static bool vhost_enable_notify_packed(struct vhost_virtqueue *vq)
+{
+	struct vring_packed_desc *d = vq->desc_packed + vq->avail_idx;
+	__virtio16 flags;
+	int ret;
+
+	/* TODO: enable notification through device area */
+
+	/* They could have slipped one in as we were doing that: make
+	 * sure it's written, then check again.
+	 */
+	smp_mb();
+
+	ret = vhost_get_desc_flags(vq, &flags, vq->avail_idx);
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			vq->avail_idx, &d->flags);
+		return -EFAULT;
+	}
+
+	return desc_is_avail(vq, vq->avail_wrap_counter, flags);
+}
+
+static bool vhost_enable_notify_split(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3083,10 +3891,20 @@ bool vhost_enable_notify(struct vhost_virtqueue *vq)
 
 	return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
 }
+
+/* OK, now we need to know about added descriptors. */
+bool vhost_enable_notify(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_enable_notify, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
-/* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_virtqueue *vq)
+static void vhost_disable_notify_packed(struct vhost_virtqueue *vq)
+{
+	/* TODO: disable notification through device area */
+}
+
+static void vhost_disable_notify_split(struct vhost_virtqueue *vq)
 {
 	int r;
 
@@ -3100,6 +3918,12 @@ void vhost_disable_notify(struct vhost_virtqueue *vq)
 			       &vq->used->flags, r);
 	}
 }
+
+/* We don't need to be notified again. */
+void vhost_disable_notify(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_disable_notify, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_disable_notify);
 
 /* Create a new message. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index b8a5d1a2bed9..7f3a2dd1b628 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,8 +37,17 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+struct vring_packed_used_elem {
+	__u16 id;
+	__u16 count;
+	__u32 len;
+};
+
 struct vhost_used_elem {
-	struct vring_used_elem elem;
+	union {
+		struct vring_used_elem elem;
+		struct vring_packed_used_elem packed_elem;
+	};
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
@@ -112,7 +121,10 @@ struct vhost_virtqueue {
 	/* The actual ring of buffers. */
 	struct mutex mutex;
 	unsigned int num;
-	struct vring_desc __user *desc;
+	union {
+		struct vring_desc __user *desc;
+		struct vring_packed_desc __user *desc_packed;
+	};
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
 
@@ -166,7 +178,7 @@ struct vhost_virtqueue {
 	struct iovec iov[UIO_MAXIOV];
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
-	struct vhost_used_elem *heads;
+	void *heads;
 	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
@@ -188,6 +200,9 @@ struct vhost_virtqueue {
 	u32 busyloop_timeout;
 	spinlock_t mmu_lock;
 	int invalidate_count;
+	bool last_used_wrap_counter;
+	bool avail_wrap_counter;
+	bool last_avail_wrap_counter;
 };
 
 struct vhost_msg_node {
@@ -241,7 +256,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq,
+			   struct vhost_used_elem *elem);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, struct vhost_used_elem *, int);

From patchwork Wed Jul 17 10:52:54 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047643
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E87A912
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:22 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D20B28723
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:22 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5160728739; Wed, 17 Jul 2019 10:54:22 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76BAA2872A
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2387655AbfGQKyT (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:54:19 -0400
Received: from mx1.redhat.com ([209.132.183.28]:57050 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387539AbfGQKyS (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:54:18 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id 30293307D90D;
        Wed, 17 Jul 2019 10:54:18 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 4143310246E8;
        Wed, 17 Jul 2019 10:54:14 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 14/15] vhost: event suppression for packed ring
Date: Wed, 17 Jul 2019 06:52:54 -0400
Message-Id: <20190717105255.63488-15-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]);
 Wed, 17 Jul 2019 10:54:18 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch introduces support for event suppression. This is done by
have a two areas: device area and driver area. One side could then try
to disable or enable (delayed) notification from other side by using a
boolean hint or event index interface in the areas.

For more information, please refer Virtio spec.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 265 +++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.h |  11 +-
 2 files changed, 255 insertions(+), 21 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a7d24b9d5204..a188e9af3b35 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1524,6 +1524,76 @@ static inline int vhost_put_desc_flags(struct vhost_virtqueue *vq,
 	return vhost_put_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
 }
 
+static int vhost_get_driver_off_wrap(struct vhost_virtqueue *vq,
+				     __virtio16 *off_wrap)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
+	if (likely(event)) {
+		*off_wrap = event->off_wrap;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_get_user(vq, *off_wrap,
+			      &vq->driver_event->off_wrap,
+			      VHOST_ADDR_AVAIL);
+}
+
+static int vhost_get_driver_flags(struct vhost_virtqueue *vq,
+				  __virtio16 *driver_flags)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
+
+	if (likely(event)) {
+		*driver_flags = event->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_get_user(vq, *driver_flags, &vq->driver_event->flags,
+			      VHOST_ADDR_AVAIL);
+}
+
+static int vhost_put_device_off_wrap(struct vhost_virtqueue *vq,
+				     __virtio16 *off_wrap)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(event)) {
+		event->off_wrap = *off_wrap;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_put_user(vq, *off_wrap,
+			      &vq->device_event->off_wrap,
+			      VHOST_ADDR_USED);
+}
+
+static int vhost_put_device_flags(struct vhost_virtqueue *vq,
+				  __virtio16 *device_flags)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(event)) {
+		event->flags = *device_flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_put_user(vq, *device_flags,
+			      &vq->device_event->flags,
+			      VHOST_ADDR_USED);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -1809,10 +1879,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
 			       struct vring_used __user *used)
 {
 	struct vring_packed_desc *packed = (struct vring_packed_desc *)desc;
+	struct vring_packed_desc_event *driver_event =
+		(struct vring_packed_desc_event *)avail;
+	struct vring_packed_desc_event *device_event =
+		(struct vring_packed_desc_event *)used;
 
-	/* TODO: check device area and driver area */
 	return access_ok(packed, num * sizeof(*packed)) &&
-	       access_ok(packed, num * sizeof(*packed));
+	       access_ok(packed, num * sizeof(*packed)) &&
+	       access_ok(driver_event, sizeof(*driver_event)) &&
+	       access_ok(device_event, sizeof(*device_event));
 }
 
 static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
@@ -1904,16 +1979,25 @@ static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq)
 }
 #endif
 
-int vq_meta_prefetch(struct vhost_virtqueue *vq)
+static int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
 {
-	unsigned int num = vq->num;
+	int num = vq->num;
 
-	if (!vq->iotlb) {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
-		vhost_vq_map_prefetch(vq);
-#endif
-		return 1;
-	}
+	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->driver_event,
+			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
+			       (u64)(uintptr_t)vq->device_event,
+			       sizeof(*vq->device_event), VHOST_ADDR_USED);
+}
+
+static int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
+{
+	unsigned int num = vq->num;
 
 	return iotlb_access_ok(vq, VHOST_ACCESS_RO,
 			       (u64)(uintptr_t)vq->desc,
@@ -1928,6 +2012,21 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq)
 			       vhost_get_used_size(vq, num),
 			       VHOST_ADDR_USED);
 }
+
+int vq_meta_prefetch(struct vhost_virtqueue *vq)
+{
+	if (!vq->iotlb) {
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+		vhost_vq_map_prefetch(vq);
+#endif
+		return 1;
+	}
+
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vq_iotlb_prefetch_packed(vq);
+	else
+		return vq_iotlb_prefetch_split(vq);
+}
 EXPORT_SYMBOL_GPL(vq_meta_prefetch);
 
 /* Can we log writes? */
@@ -2620,6 +2719,48 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 	return 0;
 }
 
+static int vhost_update_device_flags(struct vhost_virtqueue *vq,
+				     __virtio16 device_flags)
+{
+	void __user *flags;
+
+	if (vhost_put_device_flags(vq, &device_flags))
+		return -EFAULT;
+	if (unlikely(vq->log_used)) {
+		/* Make sure the flag is seen before log. */
+		smp_wmb();
+		/* Log used flag write. */
+		flags = &vq->device_event->flags;
+		log_write(vq->log_base, vq->log_addr +
+			  (flags - (void __user *)vq->device_event),
+			  sizeof(vq->device_event->flags));
+		if (vq->log_ctx)
+			eventfd_signal(vq->log_ctx, 1);
+	}
+	return 0;
+}
+
+static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
+					__virtio16 device_off_wrap)
+{
+	void __user *off_wrap;
+
+	if (vhost_put_device_off_wrap(vq, &device_off_wrap))
+		return -EFAULT;
+	if (unlikely(vq->log_used)) {
+		/* Make sure the flag is seen before log. */
+		smp_wmb();
+		/* Log used flag write. */
+		off_wrap = &vq->device_event->off_wrap;
+		log_write(vq->log_base, vq->log_addr +
+			  (off_wrap - (void __user *)vq->device_event),
+			  sizeof(vq->device_event->off_wrap));
+		if (vq->log_ctx)
+			eventfd_signal(vq->log_ctx, 1);
+	}
+	return 0;
+}
+
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 {
 	if (vhost_put_avail_event(vq))
@@ -3689,16 +3830,13 @@ int vhost_add_used(struct vhost_virtqueue *vq, struct vhost_used_elem *used,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
-static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_notify_split(struct vhost_dev *dev,
+			       struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
 
-	/* TODO: check driver area */
-	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
-		return true;
-
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -3731,6 +3869,62 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
 }
 
+static bool vhost_notify_packed(struct vhost_dev *dev,
+				struct vhost_virtqueue *vq)
+{
+	__virtio16 event_off_wrap, event_flags;
+	__u16 old, new, off_wrap;
+	bool v;
+
+	/* Flush out used descriptors updates. This is paired
+	 * with the barrier that the Guest executes when enabling
+	 * interrupts.
+	 */
+	smp_mb();
+
+	if (vhost_get_driver_flags(vq, &event_flags)) {
+		vq_err(vq, "Failed to get driver desc_event_flags");
+		return true;
+	}
+
+	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
+		return event_flags !=
+		       cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+
+	old = vq->signalled_used;
+	v = vq->signalled_used_valid;
+	new = vq->signalled_used = vq->last_used_idx;
+	vq->signalled_used_valid = true;
+
+	if (event_flags != cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DESC))
+		return event_flags !=
+		       cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+
+	/* Read desc event flags before event_off and event_wrap */
+	smp_rmb();
+
+	if (vhost_get_driver_off_wrap(vq, &event_off_wrap) < 0) {
+		vq_err(vq, "Failed to get driver desc_event_off/wrap");
+		return true;
+	}
+
+	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
+
+	if (unlikely(!v))
+		return true;
+
+	return vhost_vring_packed_need_event(vq, vq->last_used_wrap_counter,
+					     off_wrap, new, old);
+}
+
+static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_notify_packed(dev, vq);
+	else
+		return vhost_notify_split(dev, vq);
+}
+
 /* This actually signals the guest, using eventfd. */
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
@@ -3836,10 +4030,34 @@ EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 static bool vhost_enable_notify_packed(struct vhost_virtqueue *vq)
 {
 	struct vring_packed_desc *d = vq->desc_packed + vq->avail_idx;
-	__virtio16 flags;
+	__virtio16 flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_ENABLE);
 	int ret;
 
-	/* TODO: enable notification through device area */
+	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
+		return false;
+	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
+
+	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
+		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
+				      vq->avail_wrap_counter << 15);
+
+		ret = vhost_update_device_off_wrap(vq, off_wrap);
+		if (ret) {
+			vq_err(vq, "Failed to write to off warp at %p: %d\n",
+			       &vq->device_event->off_wrap, ret);
+			return false;
+		}
+		/* Make sure off_wrap is wrote before flags */
+		smp_wmb();
+		flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DESC);
+	}
+
+	ret = vhost_update_device_flags(vq, flags);
+	if (ret) {
+		vq_err(vq, "Failed to enable notification at %p: %d\n",
+			&vq->device_event->flags, ret);
+		return false;
+	}
 
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again.
@@ -3901,7 +4119,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
 static void vhost_disable_notify_packed(struct vhost_virtqueue *vq)
 {
-	/* TODO: disable notification through device area */
+	__virtio16 flags;
+	int r;
+
+	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
+		return;
+	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
+
+	flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+	r = vhost_update_device_flags(vq, flags);
+	if (r)
+		vq_err(vq, "Failed to enable notification at %p: %d\n",
+		       &vq->device_event->flags, r);
 }
 
 static void vhost_disable_notify_split(struct vhost_virtqueue *vq)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 7f3a2dd1b628..bb3f8bb763b9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -125,9 +125,14 @@ struct vhost_virtqueue {
 		struct vring_desc __user *desc;
 		struct vring_packed_desc __user *desc_packed;
 	};
-	struct vring_avail __user *avail;
-	struct vring_used __user *used;
-
+	union {
+		struct vring_avail __user *avail;
+		struct vring_packed_desc_event __user *driver_event;
+	};
+	union {
+		struct vring_used __user *used;
+		struct vring_packed_desc_event __user *device_event;
+	};
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	/* Read by memory accessors, modified by meta data
 	 * prefetching, MMU notifier and vring ioctl().

From patchwork Wed Jul 17 10:52:55 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jason Wang <jasowang@redhat.com>
X-Patchwork-Id: 11047647
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2599914DB
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10AE528727
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:34 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 008862873C; Wed, 17 Jul 2019 10:54:33 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96FDF28727
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 17 Jul 2019 10:54:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726424AbfGQKyc (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 17 Jul 2019 06:54:32 -0400
Received: from mx1.redhat.com ([209.132.183.28]:36348 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1725936AbfGQKy3 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 17 Jul 2019 06:54:29 -0400
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mx1.redhat.com (Postfix) with ESMTPS id C1A253066752;
        Wed, 17 Jul 2019 10:54:29 +0000 (UTC)
Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com
 (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10])
        by smtp.corp.redhat.com (Postfix) with ESMTP id D1D821001DE3;
        Wed, 17 Jul 2019 10:54:18 +0000 (UTC)
From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, jasowang@redhat.com
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        jfreimann@redhat.com, tiwei.bie@intel.com,
        maxime.coquelin@redhat.com
Subject: [PATCH V3 15/15] vhost: enable packed virtqueues
Date: Wed, 17 Jul 2019 06:52:55 -0400
Message-Id: <20190717105255.63488-16-jasowang@redhat.com>
In-Reply-To: <20190717105255.63488-1-jasowang@redhat.com>
References: <20190717105255.63488-1-jasowang@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted,
 not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]);
 Wed, 17 Jul 2019 10:54:29 +0000 (UTC)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch enables packed virtqueue support for vhost.

Testpmd (virtio-user) + vhost_net shows about 2.6% improvemetn on TX
PPS:

Before: 5.75Mpps
After : 5.90Mpps

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index bb3f8bb763b9..5483eea84a5c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -326,7 +326,8 @@ enum {
 			 (1ULL << VIRTIO_RING_F_EVENT_IDX) |
 			 (1ULL << VHOST_F_LOG_ALL) |
 			 (1ULL << VIRTIO_F_ANY_LAYOUT) |
-			 (1ULL << VIRTIO_F_VERSION_1)
+			 (1ULL << VIRTIO_F_VERSION_1) |
+			 (1ULL << VIRTIO_F_RING_PACKED)
 };
 
 static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)