diff mbox

virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

Message ID 874notoh02.fsf@rustcorp.com.au (mailing list archive)
State New, archived
Headers show

Commit Message

Rusty Russell July 27, 2012, 6:27 a.m. UTC
On Thu, 26 Jul 2012 15:05:39 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 26/07/2012 09:58, Paolo Bonzini ha scritto:
> > 
> >> > Please CC me on the "convert to sg copy-less" patches, It looks interesting
> > Sure.
> 
> Well, here is the gist of it (note it won't apply on any public tree,
> hence no SoB yet).  It should be split in multiple changesets and you
> can make more simplifications on top of it, because
> virtio_scsi_target_state is not anymore variable-sized, but that's
> secondary.

ISTR starting on such a patch years ago, but the primitives to
manipulate a chained sg_list were nonexistent, so I dropped it,
waiting for it to be fully-baked or replaced.  That hasn't happened:

> +	/* Remove scatterlist terminator, we will tack more items soon.  */
> +	vblk->sg[num + out - 1].page_link &= ~0x2;

I hate this interface:

> +int virtqueue_add_buf_sg(struct virtqueue *_vq,
> +			 struct scatterlist *sg_out,
> +			 unsigned int out,
> +			 struct scatterlist *sg_in,
> +			 unsigned int in,
> +			 void *data,
> +			 gfp_t gfp)

The point of chained scatterlists is they're self-terminated, so the
in & out counts should be calculated.

Counting them is not *that* bad, since we're about to read them all
anyway.

(Yes, the chained scatterlist stuff is complete crack, but I lost that
debate years ago.)

Here's my variant.  Networking, console and block seem OK, at least
(ie. it booted!).

From: Rusty Russell <rusty@rustcorp.com.au>
Subject: virtio: use chained scatterlists.

Rather than handing a scatterlist[] and out and in numbers to
virtqueue_add_buf(), hand two separate ones which can be chained.

I shall refrain from ranting about what a disgusting hack chained
scatterlists are.  I'll just note that this doesn't make things
simpler (see diff).

The scatterlists we use can be too large for the stack, so we put them
in our device struct and reuse them.  But in many cases we don't want
to pay the cost of sg_init_table() as we don't know how many elements
we'll have and we'd have to initialize the entire table.

This means we have two choices: carefully reset the end markers after
we call virtqueue_add_buf(), which we do in virtio_net for the xmit
path where it's easy and we want to be optimal.  Elsewhere we
implement a helper to unset the end markers after we've filled the
array.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/block/virtio_blk.c          |   37 +++++++++++++-----
 drivers/char/hw_random/virtio-rng.c |    2 -
 drivers/char/virtio_console.c       |    6 +--
 drivers/net/virtio_net.c            |   67 ++++++++++++++++++---------------
 drivers/virtio/virtio_balloon.c     |    6 +--
 drivers/virtio/virtio_ring.c        |   71 ++++++++++++++++++++++--------------
 include/linux/virtio.h              |    5 +-
 net/9p/trans_virtio.c               |   46 +++++++++++++++++++++--
 8 files changed, 159 insertions(+), 81 deletions(-)





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paolo Bonzini July 27, 2012, 8:11 a.m. UTC | #1
Il 27/07/2012 08:27, Rusty Russell ha scritto:
>> > +int virtqueue_add_buf_sg(struct virtqueue *_vq,
>> > +			 struct scatterlist *sg_out,
>> > +			 unsigned int out,
>> > +			 struct scatterlist *sg_in,
>> > +			 unsigned int in,
>> > +			 void *data,
>> > +			 gfp_t gfp)
> The point of chained scatterlists is they're self-terminated, so the
> in & out counts should be calculated.
> 
> Counting them is not *that* bad, since we're about to read them all
> anyway.
> 
> (Yes, the chained scatterlist stuff is complete crack, but I lost that
> debate years ago.)
> 
> Here's my variant.  Networking, console and block seem OK, at least
> (ie. it booted!).

I hate the for loops, even though we're about indeed to read all the
scatterlists anyway... all they do is lengthen critical sections.  Also,
being the first user of chained scatterlist doesn't exactly give me warm
fuzzies.

I think it's simpler if we provide an API to add individual buffers to
the virtqueue, so that you can do multiple virtqueue_add_buf_more
(whatever) before kicking the virtqueue.  The idea is that I can still
use indirect buffers for the scatterlists that come from the block layer
or from an skb, but I will use direct buffers for the request/response
descriptors.  The direct buffers are always a small number (usually 2),
so you can balance the effect by making the virtqueue bigger.  And for
small reads and writes, you save a kmalloc on a very hot path.

(BTW, scatterlists will have separate entries for each page; we do not
need this in virtio buffers.  Collapsing physically-adjacent entries
will speed up QEMU and will also help avoiding indirect buffers).

Paolo



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -100,6 +100,14 @@  static void blk_done(struct virtqueue *v
 	spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
 }
 
+static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++)
+		sg[i].page_link &= ~0x02;
+}
+
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
 		   struct request *req)
 {
@@ -140,6 +148,7 @@  static bool do_req(struct request_queue 
 		}
 	}
 
+	/* We layout out scatterlist in a single array, out then in. */
 	sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
 
 	/*
@@ -151,17 +160,8 @@  static bool do_req(struct request_queue 
 	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
 		sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);
 
+	/* This marks the end of the sg list at vblk->sg[out]. */
 	num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
-
-	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
-		sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
-		sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
-			   sizeof(vbr->in_hdr));
-	}
-
-	sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
-		   sizeof(vbr->status));
-
 	if (num) {
 		if (rq_data_dir(vbr->req) == WRITE) {
 			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
@@ -172,7 +172,22 @@  static bool do_req(struct request_queue 
 		}
 	}
 
-	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) {
+	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
+		sg_set_buf(&vblk->sg[out + in++], vbr->req->sense,
+			   SCSI_SENSE_BUFFERSIZE);
+		sg_set_buf(&vblk->sg[out + in++], &vbr->in_hdr,
+			   sizeof(vbr->in_hdr));
+	}
+
+	sg_set_buf(&vblk->sg[out + in++], &vbr->status,
+		   sizeof(vbr->status));
+
+	sg_unset_end_markers(vblk->sg, out+in);
+	sg_mark_end(&vblk->sg[out-1]);
+	sg_mark_end(&vblk->sg[out+in-1]);
+
+	if (virtqueue_add_buf(vblk->vq, vblk->sg, vblk->sg+out, vbr, GFP_ATOMIC)
+	    < 0) {
 		mempool_free(vbr, vblk->pool);
 		return false;
 	}
diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -47,7 +47,7 @@  static void register_buffer(u8 *buf, siz
 	sg_init_one(&sg, buf, size);
 
 	/* There should always be room for one buffer. */
-	if (virtqueue_add_buf(vq, &sg, 0, 1, buf, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf(vq, NULL, &sg, buf, GFP_KERNEL) < 0)
 		BUG();
 
 	virtqueue_kick(vq);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -392,7 +392,7 @@  static int add_inbuf(struct virtqueue *v
 
 	sg_init_one(sg, buf->buf, buf->size);
 
-	ret = virtqueue_add_buf(vq, sg, 0, 1, buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(vq, NULL, sg, buf, GFP_ATOMIC);
 	virtqueue_kick(vq);
 	return ret;
 }
@@ -457,7 +457,7 @@  static ssize_t __send_control_msg(struct
 	vq = portdev->c_ovq;
 
 	sg_init_one(sg, &cpkt, sizeof(cpkt));
-	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt, GFP_ATOMIC) >= 0) {
+	if (virtqueue_add_buf(vq, sg, NULL, &cpkt, GFP_ATOMIC) >= 0) {
 		virtqueue_kick(vq);
 		while (!virtqueue_get_buf(vq, &len))
 			cpu_relax();
@@ -506,7 +506,7 @@  static ssize_t send_buf(struct port *por
 	reclaim_consumed_buffers(port);
 
 	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf(out_vq, sg, NULL, in_buf, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -376,11 +376,11 @@  static int add_recvbuf_small(struct virt
 	skb_put(skb, MAX_PACKET_LEN);
 
 	hdr = skb_vnet_hdr(skb);
+	sg_init_table(vi->rx_sg, 2);
 	sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
-
 	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
+	err = virtqueue_add_buf(vi->rvq, NULL, vi->rx_sg, skb, gfp);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
@@ -393,6 +393,8 @@  static int add_recvbuf_big(struct virtne
 	char *p;
 	int i, err, offset;
 
+	sg_init_table(vi->rx_sg, MAX_SKB_FRAGS + 1);
+
 	/* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
 	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
 		first = get_a_page(vi, gfp);
@@ -425,8 +427,8 @@  static int add_recvbuf_big(struct virtne
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
-				first, gfp);
+
+	err = virtqueue_add_buf(vi->rvq, NULL, vi->rx_sg, first, gfp);
 	if (err < 0)
 		give_pages(vi, first);
 
@@ -444,7 +446,7 @@  static int add_recvbuf_mergeable(struct 
 
 	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
 
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+	err = virtqueue_add_buf(vi->rvq, NULL, vi->rx_sg, page, gfp);
 	if (err < 0)
 		give_pages(vi, page);
 
@@ -581,6 +583,7 @@  static int xmit_skb(struct virtnet_info 
 {
 	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
+	int ret;
 
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
@@ -620,8 +623,16 @@  static int xmit_skb(struct virtnet_info 
 		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
 
 	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
-	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
-				 0, skb, GFP_ATOMIC);
+
+	ret = virtqueue_add_buf(vi->svq, vi->tx_sg, NULL, skb, GFP_ATOMIC);
+
+	/*
+	 * An optimization: clear the end bit set by skb_to_sgvec, so
+	 * we can simply re-use vi->tx_sg[] next time.
+	 */
+	vi->tx_sg[hdr->num_sg-1].page_link &= ~0x02;
+
+	return ret;
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -757,32 +768,31 @@  static int virtnet_open(struct net_devic
  * never fail unless improperly formated.
  */
 static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
-				 struct scatterlist *data, int out, int in)
+				 struct scatterlist *cmdsg)
 {
-	struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
+	struct scatterlist in[1], out[2];
 	struct virtio_net_ctrl_hdr ctrl;
 	virtio_net_ctrl_ack status = ~0;
 	unsigned int tmp;
-	int i;
 
 	/* Caller should know better */
-	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
-		(out + in > VIRTNET_SEND_COMMAND_SG_MAX));
+	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
 
-	out++; /* Add header */
-	in++; /* Add return status */
-
+	/* Prepend header to output. */
+	sg_init_table(out, 2);
 	ctrl.class = class;
 	ctrl.cmd = cmd;
+	sg_set_buf(&out[0], &ctrl, sizeof(ctrl));
+	if (cmdsg)
+		sg_chain(out, 2, cmdsg);
+	else
+		sg_mark_end(&out[0]);
 
-	sg_init_table(sg, out + in);
+	/* Status response. */
+	sg_init_one(in, &status, sizeof(status));
 
-	sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
-	for_each_sg(data, s, out + in - 2, i)
-		sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
-	sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
 
-	BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
+	BUG_ON(virtqueue_add_buf(vi->cvq, out, in, vi, GFP_ATOMIC) < 0);
 
 	virtqueue_kick(vi->cvq);
 
@@ -800,8 +810,7 @@  static void virtnet_ack_link_announce(st
 {
 	rtnl_lock();
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
-				  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL,
-				  0, 0))
+				  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL))
 		dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
 	rtnl_unlock();
 }
@@ -839,16 +848,14 @@  static void virtnet_set_rx_mode(struct n
 	sg_init_one(sg, &promisc, sizeof(promisc));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
-				  VIRTIO_NET_CTRL_RX_PROMISC,
-				  sg, 1, 0))
+				  VIRTIO_NET_CTRL_RX_PROMISC, sg))
 		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
 			 promisc ? "en" : "dis");
 
 	sg_init_one(sg, &allmulti, sizeof(allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
-				  VIRTIO_NET_CTRL_RX_ALLMULTI,
-				  sg, 1, 0))
+				  VIRTIO_NET_CTRL_RX_ALLMULTI, sg))
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
 			 allmulti ? "en" : "dis");
 
@@ -887,7 +894,7 @@  static void virtnet_set_rx_mode(struct n
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
 				  VIRTIO_NET_CTRL_MAC_TABLE_SET,
-				  sg, 2, 0))
+				  sg))
 		dev_warn(&dev->dev, "Failed to set MAC fitler table.\n");
 
 	kfree(buf);
@@ -901,7 +908,7 @@  static int virtnet_vlan_rx_add_vid(struc
 	sg_init_one(&sg, &vid, sizeof(vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
-				  VIRTIO_NET_CTRL_VLAN_ADD, &sg, 1, 0))
+				  VIRTIO_NET_CTRL_VLAN_ADD, &sg))
 		dev_warn(&dev->dev, "Failed to add VLAN ID %d.\n", vid);
 	return 0;
 }
@@ -914,7 +921,7 @@  static int virtnet_vlan_rx_kill_vid(stru
 	sg_init_one(&sg, &vid, sizeof(vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
-				  VIRTIO_NET_CTRL_VLAN_DEL, &sg, 1, 0))
+				  VIRTIO_NET_CTRL_VLAN_DEL, &sg))
 		dev_warn(&dev->dev, "Failed to kill VLAN ID %d.\n", vid);
 	return 0;
 }
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -102,7 +102,7 @@  static void tell_host(struct virtio_ball
 	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf(vq, &sg, NULL, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
@@ -246,7 +246,7 @@  static void stats_handle_request(struct 
 	if (!virtqueue_get_buf(vq, &len))
 		return;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf(vq, &sg, NULL, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 }
@@ -331,7 +331,7 @@  static int init_vqs(struct virtio_balloo
 		 * use it to signal us later.
 		 */
 		sg_init_one(&sg, vb->stats, sizeof vb->stats);
-		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb, GFP_KERNEL)
+		if (virtqueue_add_buf(vb->stats_vq, &sg, NULL, vb, GFP_KERNEL)
 		    < 0)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -121,35 +121,41 @@  struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+/* This doesn't have the counter that for_each_sg() has */
+#define foreach_sg(sglist, i)			\
+	for (i = (sglist); i; i = sg_next(i))
+
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
-			      struct scatterlist sg[],
-			      unsigned int out,
-			      unsigned int in,
+			      unsigned int num,
+			      const struct scatterlist *out,
+			      const struct scatterlist *in,
 			      gfp_t gfp)
 {
+	const struct scatterlist *sg;
 	struct vring_desc *desc;
 	unsigned head;
 	int i;
 
-	desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
+	desc = kmalloc(num * sizeof(struct vring_desc), gfp);
 	if (!desc)
 		return -ENOMEM;
 
 	/* Transfer entries from the sg list into the indirect page */
-	for (i = 0; i < out; i++) {
+	i = 0;
+	foreach_sg(out, sg) {
 		desc[i].flags = VRING_DESC_F_NEXT;
 		desc[i].addr = sg_phys(sg);
 		desc[i].len = sg->length;
 		desc[i].next = i+1;
-		sg++;
+		i++;
 	}
-	for (; i < (out + in); i++) {
+	foreach_sg(in, sg) {
 		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 		desc[i].addr = sg_phys(sg);
 		desc[i].len = sg->length;
 		desc[i].next = i+1;
-		sg++;
+		i++;
 	}
 
 	/* Last one doesn't continue. */
@@ -171,12 +177,21 @@  static int vring_add_indirect(struct vri
 	return head;
 }
 
+static unsigned int count_sg(const struct scatterlist *sg)
+{
+	unsigned int count = 0;
+	const struct scatterlist *i;
+
+	foreach_sg(sg, i)
+		count++;
+	return count;
+}
+
 /**
  * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
- * @sg: the description of the buffer(s).
- * @out_num: the number of sg readable by other side
- * @in_num: the number of sg which are writable (after readable ones)
+ * @out: the description of the output buffer(s).
+ * @in: the description of the input buffer(s).
  * @data: the token identifying the buffer.
  * @gfp: how to do memory allocations (if necessary).
  *
@@ -189,20 +204,23 @@  static int vring_add_indirect(struct vri
  * we can put an entire sg[] array inside a single queue entry.
  */
 int virtqueue_add_buf(struct virtqueue *_vq,
-		      struct scatterlist sg[],
-		      unsigned int out,
-		      unsigned int in,
+		      const struct scatterlist *out,
+		      const struct scatterlist *in,
 		      void *data,
 		      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	unsigned int i, avail, uninitialized_var(prev);
+	unsigned int i, avail, uninitialized_var(prev), num;
+	const struct scatterlist *sg;
 	int head;
 
 	START_USE(vq);
 
 	BUG_ON(data == NULL);
 
+	num = count_sg(out) + count_sg(in);
+	BUG_ON(num == 0);
+
 #ifdef DEBUG
 	{
 		ktime_t now = ktime_get();
@@ -218,18 +236,17 @@  int virtqueue_add_buf(struct virtqueue *
 
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->num_free) {
-		head = vring_add_indirect(vq, sg, out, in, gfp);
+	if (vq->indirect && num > 1 && vq->num_free) {
+		head = vring_add_indirect(vq, num, out, in, gfp);
 		if (likely(head >= 0))
 			goto add_head;
 	}
 
-	BUG_ON(out + in > vq->vring.num);
-	BUG_ON(out + in == 0);
+	BUG_ON(num > vq->vring.num);
 
-	if (vq->num_free < out + in) {
+	if (vq->num_free < num) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
-			 out + in, vq->num_free);
+			 num, vq->num_free);
 		/* FIXME: for historical reasons, we force a notify here if
 		 * there are outgoing parts to the buffer.  Presumably the
 		 * host should service the ring ASAP. */
@@ -240,22 +257,24 @@  int virtqueue_add_buf(struct virtqueue *
 	}
 
 	/* We're about to use some buffers from the free list. */
-	vq->num_free -= out + in;
+	vq->num_free -= num;
 
 	head = vq->free_head;
-	for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) {
+
+	i = vq->free_head;
+	foreach_sg(out, sg) {
 		vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
 		vq->vring.desc[i].addr = sg_phys(sg);
 		vq->vring.desc[i].len = sg->length;
 		prev = i;
-		sg++;
+		i = vq->vring.desc[i].next;
 	}
-	for (; in; i = vq->vring.desc[i].next, in--) {
+	foreach_sg(in, sg) {
 		vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 		vq->vring.desc[i].addr = sg_phys(sg);
 		vq->vring.desc[i].len = sg->length;
 		prev = i;
-		sg++;
+		i = vq->vring.desc[i].next;
 	}
 	/* Last one doesn't continue. */
 	vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -26,9 +26,8 @@  struct virtqueue {
 };
 
 int virtqueue_add_buf(struct virtqueue *vq,
-		      struct scatterlist sg[],
-		      unsigned int out_num,
-		      unsigned int in_num,
+		      const struct scatterlist *out,
+		      const struct scatterlist *in,
 		      void *data,
 		      gfp_t gfp);
 
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -244,6 +244,14 @@  pack_sg_list_p(struct scatterlist *sg, i
 	return index - start;
 }
 
+static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++)
+		sg[i].page_link &= ~0x02;
+}
+
 /**
  * p9_virtio_request - issue a request
  * @client: client instance issuing the request
@@ -258,6 +266,7 @@  p9_virtio_request(struct p9_client *clie
 	int in, out;
 	unsigned long flags;
 	struct virtio_chan *chan = client->trans;
+	const struct scatterlist *outsg = NULL, *insg = NULL;
 
 	p9_debug(P9_DEBUG_TRANS, "9p debug: virtio request\n");
 
@@ -268,12 +277,21 @@  req_retry:
 	/* Handle out VirtIO ring buffers */
 	out = pack_sg_list(chan->sg, 0,
 			   VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+	if (out) {
+		sg_unset_end_markers(chan->sg, out-1);
+		sg_mark_end(&chan->sg[out-1]);
+		outsg = chan->sg;
+	}
 
 	in = pack_sg_list(chan->sg, out,
 			  VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
+	if (in) {
+		sg_unset_end_markers(chan->sg+out, in-1);
+		sg_mark_end(&chan->sg[out+in-1]);
+		insg = chan->sg+out;
+	}
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
-				GFP_ATOMIC);
+	err = virtqueue_add_buf(chan->vq, outsg, insg, req->tc, GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;
@@ -355,6 +377,7 @@  p9_virtio_zc_request(struct p9_client *c
 	int in_nr_pages = 0, out_nr_pages = 0;
 	struct page **in_pages = NULL, **out_pages = NULL;
 	struct virtio_chan *chan = client->trans;
+	struct scatterlist *insg = NULL, *outsg = NULL;
 
 	p9_debug(P9_DEBUG_TRANS, "virtio request\n");
 
@@ -402,6 +425,13 @@  req_retry_pinned:
 	if (out_pages)
 		out += pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
 				      out_pages, out_nr_pages, uodata, outlen);
+
+	if (out) {
+		sg_unset_end_markers(chan->sg, out-1);
+		sg_mark_end(&chan->sg[out-1]);
+		outsg = chan->sg;
+	}
+
 	/*
 	 * Take care of in data
 	 * For example TREAD have 11.
@@ -414,9 +446,13 @@  req_retry_pinned:
 	if (in_pages)
 		in += pack_sg_list_p(chan->sg, out + in, VIRTQUEUE_NUM,
 				     in_pages, in_nr_pages, uidata, inlen);
+	if (in) {
+		sg_unset_end_markers(chan->sg+out, in-1);
+		sg_mark_end(&chan->sg[out+in-1]);
+		insg = chan->sg + out;
+	}
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
-				GFP_ATOMIC);
+	err = virtqueue_add_buf(chan->vq, outsg, insg, req->tc, GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;