[0/3] virtio-net: inline header support
diff mbox

Message ID 87vces2gxq.fsf@rustcorp.com.au
State New, archived
Headers show

Commit Message

Rusty Russell Oct. 3, 2012, 6:44 a.m. UTC
"Michael S. Tsirkin" <mst@redhat.com> writes:

> Thinking about Sasha's patches, we can reduce ring usage
> for virtio net small packets dramatically if we put
> virtio net header inline with the data.
> This can be done for free in case guest net stack allocated
> extra head room for the packet, and I don't see
> why would this have any downsides.

I've been wanting to do this for the longest time... but...

> Even though with my recent patches qemu
> no longer requires header to be the first s/g element,
> we need a new feature bit to detect this.
> A trivial qemu patch will be sent separately.

There's a reason I haven't done this.  I really, really dislike "my
implemention isn't broken" feature bits.  We could have an infinite
number of them, for each bug in each device.

So my plan was to tie this assumption to the new PCI layout.  And have a
stress-testing patch like the one below in the kernel (see my virtio-wip
branch for stuff like this).  Turn it on at boot with
"virtio_ring.torture" on the kernel commandline.

BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
is too old.  Building the latest git now...

Cheers,
Rusty.

Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

Virtio devices are not supposed to depend on the framing of the scatter-gather
lists, but various implementations did.  Safeguard this in future by adding
an option to deliberately create perverse descriptors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rusty Russell Oct. 3, 2012, 7:10 a.m. UTC | #1
Rusty Russell <rusty@rustcorp.com.au> writes:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> Thinking about Sasha's patches, we can reduce ring usage
>> for virtio net small packets dramatically if we put
>> virtio net header inline with the data.
>> This can be done for free in case guest net stack allocated
>> extra head room for the packet, and I don't see
>> why would this have any downsides.
>
> I've been wanting to do this for the longest time... but...
>
>> Even though with my recent patches qemu
>> no longer requires header to be the first s/g element,

Breaks for me; see why I hate bug features?  Now we'd need another
one...

qemu-system-i386: virtio: trying to map MMIO memory

Please try my patch.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori Oct. 4, 2012, 1:24 a.m. UTC | #2
Rusty Russell <rusty@rustcorp.com.au> writes:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> Thinking about Sasha's patches, we can reduce ring usage
>> for virtio net small packets dramatically if we put
>> virtio net header inline with the data.
>> This can be done for free in case guest net stack allocated
>> extra head room for the packet, and I don't see
>> why would this have any downsides.
>
> I've been wanting to do this for the longest time... but...
>
>> Even though with my recent patches qemu
>> no longer requires header to be the first s/g element,
>> we need a new feature bit to detect this.
>> A trivial qemu patch will be sent separately.
>
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.

This is a bug in the specification.

The QEMU implementation pre-dates the specification.  All of the actual
implementations of virtio relied on the semantics of s/g elements and
still do.

What's in the specification really doesn't matter when it doesn't agree
with all of the existing implementations.

Users use implementations, not specifications.  The specification really
ought to be changed here.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori Oct. 4, 2012, 1:35 a.m. UTC | #3
Rusty Russell <rusty@rustcorp.com.au> writes:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
>
> So my plan was to tie this assumption to the new PCI layout.  And have a
> stress-testing patch like the one below in the kernel (see my virtio-wip
> branch for stuff like this).  Turn it on at boot with
> "virtio_ring.torture" on the kernel commandline.
>
> BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
> is too old.  Building the latest git now...
>
> Cheers,
> Rusty.
>
> Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE
>
> Virtio devices are not supposed to depend on the framing of the scatter-gather
> lists, but various implementations did.  Safeguard this in future by adding
> an option to deliberately create perverse descriptors.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Ignore framing is really a bad idea.  You want backends to enforce
reasonable framing because guest's shouldn't do silly things with framing.

For instance, with virtio-blk, if you want decent performance, you
absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
the host to submit I/O requests, then it's critical that all of the s/g
elements are aligned to a sector boundary and sized to a sector
boundary.

Yes, QEMU can handle if that's not the case, but it would be insanely
stupid for a guest not to do this.  This is the sort of thing that ought
to be enforced in the specification because a guest cannot perform well
if it doesn't follow these rules.

A spec isn't terribly useful if the result is guest drivers that are
slow.  There's very little to gain by not enforcing rules around framing
and there's a lot to lose if a guest frames incorrectly.

In the rare case where we want to make a framing change, we should use
feature bits like Michael is proposing.

In this case, we should simply say that with the feature bit, the vnet
header can be in the same element as the data but not allow the header
to be spread across multiple elements.

Regards,

Anthony Liguori

>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..930a4ea 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,15 @@ config VIRTIO
>  	  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>  	  CONFIG_RPMSG or CONFIG_S390_GUEST.
>  
> +config VIRTIO_DEVICE_TORTURE
> +	bool "Virtio device torture tests"
> +	depends on VIRTIO && DEBUG_KERNEL
> +	help
> +	  This makes the virtio_ring implementation creatively change
> +	  the format of requests to make sure that devices are
> +	  properly implemented.  This will make your virtual machine
> +	  slow *and* unreliable!  Say N.
> +
>  menu "Virtio drivers"
>  
>  config VIRTIO_PCI
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e639584..8893753 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -124,6 +124,149 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
> +static bool torture;
> +module_param(torture, bool, 0644);
> +
> +struct torture {
> +	unsigned int orig_out, orig_in;
> +	void *orig_data;
> +	struct scatterlist sg[4];
> +	struct scatterlist orig_sg[];
> +};
> +
> +static size_t tot_len(struct scatterlist sg[], unsigned num)
> +{
> +	size_t len, i;
> +
> +	for (len = 0, i = 0; i < num; i++)
> +		len += sg[i].length;
> +
> +	return len;
> +}
> +
> +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
> +			 const struct scatterlist *src, unsigned snum)
> +{
> +	unsigned len;
> +	struct scatterlist s, d;
> +
> +	s = *src;
> +	d = *dst;
> +
> +	while (snum && dnum) {
> +		len = min(s.length, d.length);
> +		memcpy(sg_virt(&d), sg_virt(&s), len);
> +		d.offset += len;
> +		d.length -= len;
> +		s.offset += len;
> +		s.length -= len;
> +		if (!s.length) {
> +			BUG_ON(snum == 0);
> +			src++;
> +			snum--;
> +			s = *src;
> +		}
> +		if (!d.length) {
> +			BUG_ON(dnum == 0);
> +			dst++;
> +			dnum--;
> +			d = *dst;
> +		}
> +	}
> +}
> +
> +static bool torture_replace(struct scatterlist **sg,
> +			     unsigned int *out,
> +			     unsigned int *in,
> +			     void **data,
> +			     gfp_t gfp)
> +{
> +	static size_t seed;
> +	struct torture *t;
> +	size_t outlen, inlen, ourseed, len1;
> +	void *buf;
> +
> +	if (!torture)
> +		return true;
> +
> +	outlen = tot_len(*sg, *out);
> +	inlen = tot_len(*sg + *out, *in);
> +
> +	/* This will break horribly on large block requests. */
> +	t = kmalloc(sizeof(*t) + (*out + *in) * sizeof(t->orig_sg[1])
> +		    + outlen + 1 + inlen + 1, gfp);
> +	if (!t)
> +		return false;
> +
> +	sg_init_table(t->sg, 4);
> +	buf = &t->orig_sg[*out + *in];
> +
> +	memcpy(t->orig_sg, *sg, sizeof(**sg) * (*out + *in));
> +	t->orig_out = *out;
> +	t->orig_in = *in;
> +	t->orig_data = *data;
> +	*data = t;
> +
> +	ourseed = ACCESS_ONCE(seed);
> +	seed++;
> +
> +	*sg = t->sg;
> +	if (outlen) {
> +		/* Split outbuf into two parts, one byte apart. */
> +		*out = 2;
> +		len1 = ourseed % (outlen + 1);
> +		sg_set_buf(&t->sg[0], buf, len1);
> +		buf += len1 + 1;
> +		sg_set_buf(&t->sg[1], buf, outlen - len1);
> +		buf += outlen - len1;
> +		copy_sg_data(t->sg, *out, t->orig_sg, t->orig_out);
> +	}
> +
> +	if (inlen) {
> +		/* Split inbuf into two parts, one byte apart. */
> +		*in = 2;
> +		len1 = ourseed % (inlen + 1);
> +		sg_set_buf(&t->sg[*out], buf, len1);
> +		buf += len1 + 1;
> +		sg_set_buf(&t->sg[*out + 1], buf, inlen - len1);
> +		buf += inlen - len1;
> +	}
> +	return true;
> +}
> +
> +static void *torture_done(struct torture *t)
> +{
> +	void *data;
> +
> +	if (!torture)
> +		return t;
> +
> +	if (t->orig_in)
> +		copy_sg_data(t->orig_sg + t->orig_out, t->orig_in,
> +			     t->sg + (t->orig_out ? 2 : 0), 2);
> +
> +	data = t->orig_data;
> +	kfree(t);
> +	return data;
> +}
> +
> +#else
> +static bool torture_replace(struct scatterlist **sg,
> +			     unsigned int *out,
> +			     unsigned int *in,
> +			     void **data,
> +			     gfp_t gfp)
> +{
> +	return true;
> +}
> +
> +static void *torture_done(void *data)
> +{
> +	return data;
> +}
> +#endif /* CONFIG_VIRTIO_DEVICE_TORTURE */
> +
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
>  			      struct scatterlist sg[],
> @@ -213,6 +356,9 @@ int virtqueue_add_buf(struct virtqueue *_vq,
>  
>  	BUG_ON(data == NULL);
>  
> +	if (!torture_replace(&sg, &out, &in, &data, gfp))
> +		return -ENOMEM;
> +
>  #ifdef DEBUG
>  	{
>  		ktime_t now = ktime_get();
> @@ -246,6 +392,7 @@ int virtqueue_add_buf(struct virtqueue *_vq,
>  		if (out)
>  			vq->notify(&vq->vq);
>  		END_USE(vq);
> +		torture_done(data);
>  		return -ENOSPC;
>  	}
>  
> @@ -476,7 +623,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  #endif
>  
>  	END_USE(vq);
> -	return ret;
> +	return torture_done(ret);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell Oct. 4, 2012, 3:34 a.m. UTC | #4
Anthony Liguori <anthony@codemonkey.ws> writes:
> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>>> Thinking about Sasha's patches, we can reduce ring usage
>>> for virtio net small packets dramatically if we put
>>> virtio net header inline with the data.
>>> This can be done for free in case guest net stack allocated
>>> extra head room for the packet, and I don't see
>>> why would this have any downsides.
>>
>> I've been wanting to do this for the longest time... but...
>>
>>> Even though with my recent patches qemu
>>> no longer requires header to be the first s/g element,
>>> we need a new feature bit to detect this.
>>> A trivial qemu patch will be sent separately.
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>
> This is a bug in the specification.
>
> The QEMU implementation pre-dates the specification.  All of the actual
> implementations of virtio relied on the semantics of s/g elements and
> still do.

lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
ever going to be merged, so I'm not sure what its status is?  But I'm
determined to fix qemu, and hence my torture patch to make sure this
doesn't creep in again.

> What's in the specification really doesn't matter when it doesn't agree
> with all of the existing implementations.
>
> Users use implementations, not specifications.  The specification really
> ought to be changed here.

I'm sorely tempted, except that we're losing a real optimization because
of this :(

The specification has long contained the footnote:

        The current qemu device implementations mistakenly insist that
        the first descriptor cover the header in these cases exactly, so
        a cautious driver should arrange it so.

I'd like to tie this caveat to the PCI capability change, so this note
will move to the appendix with the old PCI layout.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori Oct. 4, 2012, 4:29 a.m. UTC | #5
Rusty Russell <rusty@rustcorp.com.au> writes:

> Anthony Liguori <anthony@codemonkey.ws> writes:
>> Rusty Russell <rusty@rustcorp.com.au> writes:
>>
>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>>
>>>> Thinking about Sasha's patches, we can reduce ring usage
>>>> for virtio net small packets dramatically if we put
>>>> virtio net header inline with the data.
>>>> This can be done for free in case guest net stack allocated
>>>> extra head room for the packet, and I don't see
>>>> why would this have any downsides.
>>>
>>> I've been wanting to do this for the longest time... but...
>>>
>>>> Even though with my recent patches qemu
>>>> no longer requires header to be the first s/g element,
>>>> we need a new feature bit to detect this.
>>>> A trivial qemu patch will be sent separately.
>>>
>>> There's a reason I haven't done this.  I really, really dislike "my
>>> implemention isn't broken" feature bits.  We could have an infinite
>>> number of them, for each bug in each device.
>>
>> This is a bug in the specification.
>>
>> The QEMU implementation pre-dates the specification.  All of the actual
>> implementations of virtio relied on the semantics of s/g elements and
>> still do.
>
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

There are even more implementations out there and I'd wager they all
rely on framing.

>> What's in the specification really doesn't matter when it doesn't agree
>> with all of the existing implementations.
>>
>> Users use implementations, not specifications.  The specification really
>> ought to be changed here.
>
> I'm sorely tempted, except that we're losing a real optimization because
> of this :(

What optimizations?  What Michael is proposing is still achievable with
a device feature.  Are there other optimizations that can be achieved by
changing framing that we can't achieve with feature bits?

As I mentioned in another note, bad framing decisions can cause
performance issues too...

> The specification has long contained the footnote:
>
>         The current qemu device implementations mistakenly insist that
>         the first descriptor cover the header in these cases exactly, so
>         a cautious driver should arrange it so.

I seem to recall this being a compromise between you and I..  I think
I objected strongly to this back when you first wrote the spec and you
added this to appease me ;-)

Regards,

Anthony Liguori

>
> I'd like to tie this caveat to the PCI capability change, so this note
> will move to the appendix with the old PCI layout.
>
> Cheers,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell Oct. 4, 2012, 5:17 a.m. UTC | #6
Anthony Liguori <anthony@codemonkey.ws> writes:

> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>>
>> So my plan was to tie this assumption to the new PCI layout.  And have a
>> stress-testing patch like the one below in the kernel (see my virtio-wip
>> branch for stuff like this).  Turn it on at boot with
>> "virtio_ring.torture" on the kernel commandline.
>>
>> BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
>> is too old.  Building the latest git now...
>>
>> Cheers,
>> Rusty.
>>
>> Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE
>>
>> Virtio devices are not supposed to depend on the framing of the scatter-gather
>> lists, but various implementations did.  Safeguard this in future by adding
>> an option to deliberately create perverse descriptors.
>>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Ignore framing is really a bad idea.  You want backends to enforce
> reasonable framing because guest's shouldn't do silly things with framing.
>
> For instance, with virtio-blk, if you want decent performance, you
> absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
> the host to submit I/O requests, then it's critical that all of the s/g
> elements are aligned to a sector boundary and sized to a sector
> boundary.
>
> Yes, QEMU can handle if that's not the case, but it would be insanely
> stupid for a guest not to do this.  This is the sort of thing that ought
> to be enforced in the specification because a guest cannot perform well
> if it doesn't follow these rules.

Lack of imagination is what got us into trouble in the first place; when
presented with one counter-example, it's useful to look for others.

That's our job, not to dismiss them a "insanely stupid".

For example:
1) Perhaps the guest isn't trying to perform well, it's trying to be a
   tiny bootloader?
2) Perhaps the guest is the direct consumer, and aligning buffers is
   redundant.

> A spec isn't terribly useful if the result is guest drivers that are
> slow.  There's very little to gain by not enforcing rules around framing
> and there's a lot to lose if a guest frames incorrectly.

The guest has the flexibility, and gets to decide.  The spec is not
forcing them to perform badly.

> In the rare case where we want to make a framing change, we should use
> feature bits like Michael is proposing.
>
> In this case, we should simply say that with the feature bit, the vnet
> header can be in the same element as the data but not allow the header
> to be spread across multiple elements.

I'd love to split struct virtio_net_hdr_mrg_rxbuf, so the num_buffers
ends up somewhere else.

The simplest rules are "never" or "always".

Cheers,
Rusty.
PS.  Inserting zero-length buffers is something I'd be prepared to rule
     out, my current patch does it just for yuks...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Oct. 8, 2012, 8:41 p.m. UTC | #7
On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > Thinking about Sasha's patches, we can reduce ring usage
> > for virtio net small packets dramatically if we put
> > virtio net header inline with the data.
> > This can be done for free in case guest net stack allocated
> > extra head room for the packet, and I don't see
> > why would this have any downsides.
> 
> I've been wanting to do this for the longest time... but...
> 
> > Even though with my recent patches qemu
> > no longer requires header to be the first s/g element,
> > we need a new feature bit to detect this.
> > A trivial qemu patch will be sent separately.
> 
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
> 
> So my plan was to tie this assumption to the new PCI layout.

I don't object but old qemu has this limitation for s390 as well,
and that's not using PCI, right? So how do we detect
new hypervisor there?
Michael S. Tsirkin Oct. 8, 2012, 9:31 p.m. UTC | #8
On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
> > Rusty Russell <rusty@rustcorp.com.au> writes:
> >
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>
> >>> Thinking about Sasha's patches, we can reduce ring usage
> >>> for virtio net small packets dramatically if we put
> >>> virtio net header inline with the data.
> >>> This can be done for free in case guest net stack allocated
> >>> extra head room for the packet, and I don't see
> >>> why would this have any downsides.
> >>
> >> I've been wanting to do this for the longest time... but...
> >>
> >>> Even though with my recent patches qemu
> >>> no longer requires header to be the first s/g element,
> >>> we need a new feature bit to detect this.
> >>> A trivial qemu patch will be sent separately.
> >>
> >> There's a reason I haven't done this.  I really, really dislike "my
> >> implemention isn't broken" feature bits.  We could have an infinite
> >> number of them, for each bug in each device.
> >
> > This is a bug in the specification.
> >
> > The QEMU implementation pre-dates the specification.  All of the actual
> > implementations of virtio relied on the semantics of s/g elements and
> > still do.
> 
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..930a4ea 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,15 @@  config VIRTIO
 	  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
 	  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VIRTIO_DEVICE_TORTURE
+	bool "Virtio device torture tests"
+	depends on VIRTIO && DEBUG_KERNEL
+	help
+	  This makes the virtio_ring implementation creatively change
+	  the format of requests to make sure that devices are
+	  properly implemented.  This will make your virtual machine
+	  slow *and* unreliable!  Say N.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e639584..8893753 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -124,6 +124,149 @@  struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
+static bool torture;
+module_param(torture, bool, 0644);
+
+struct torture {
+	unsigned int orig_out, orig_in;
+	void *orig_data;
+	struct scatterlist sg[4];
+	struct scatterlist orig_sg[];
+};
+
+static size_t tot_len(struct scatterlist sg[], unsigned num)
+{
+	size_t len, i;
+
+	for (len = 0, i = 0; i < num; i++)
+		len += sg[i].length;
+
+	return len;
+}
+
+static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
+			 const struct scatterlist *src, unsigned snum)
+{
+	unsigned len;
+	struct scatterlist s, d;
+
+	s = *src;
+	d = *dst;
+
+	while (snum && dnum) {
+		len = min(s.length, d.length);
+		memcpy(sg_virt(&d), sg_virt(&s), len);
+		d.offset += len;
+		d.length -= len;
+		s.offset += len;
+		s.length -= len;
+		if (!s.length) {
+			BUG_ON(snum == 0);
+			src++;
+			snum--;
+			s = *src;
+		}
+		if (!d.length) {
+			BUG_ON(dnum == 0);
+			dst++;
+			dnum--;
+			d = *dst;
+		}
+	}
+}
+
+static bool torture_replace(struct scatterlist **sg,
+			     unsigned int *out,
+			     unsigned int *in,
+			     void **data,
+			     gfp_t gfp)
+{
+	static size_t seed;
+	struct torture *t;
+	size_t outlen, inlen, ourseed, len1;
+	void *buf;
+
+	if (!torture)
+		return true;
+
+	outlen = tot_len(*sg, *out);
+	inlen = tot_len(*sg + *out, *in);
+
+	/* This will break horribly on large block requests. */
+	t = kmalloc(sizeof(*t) + (*out + *in) * sizeof(t->orig_sg[1])
+		    + outlen + 1 + inlen + 1, gfp);
+	if (!t)
+		return false;
+
+	sg_init_table(t->sg, 4);
+	buf = &t->orig_sg[*out + *in];
+
+	memcpy(t->orig_sg, *sg, sizeof(**sg) * (*out + *in));
+	t->orig_out = *out;
+	t->orig_in = *in;
+	t->orig_data = *data;
+	*data = t;
+
+	ourseed = ACCESS_ONCE(seed);
+	seed++;
+
+	*sg = t->sg;
+	if (outlen) {
+		/* Split outbuf into two parts, one byte apart. */
+		*out = 2;
+		len1 = ourseed % (outlen + 1);
+		sg_set_buf(&t->sg[0], buf, len1);
+		buf += len1 + 1;
+		sg_set_buf(&t->sg[1], buf, outlen - len1);
+		buf += outlen - len1;
+		copy_sg_data(t->sg, *out, t->orig_sg, t->orig_out);
+	}
+
+	if (inlen) {
+		/* Split inbuf into two parts, one byte apart. */
+		*in = 2;
+		len1 = ourseed % (inlen + 1);
+		sg_set_buf(&t->sg[*out], buf, len1);
+		buf += len1 + 1;
+		sg_set_buf(&t->sg[*out + 1], buf, inlen - len1);
+		buf += inlen - len1;
+	}
+	return true;
+}
+
+static void *torture_done(struct torture *t)
+{
+	void *data;
+
+	if (!torture)
+		return t;
+
+	if (t->orig_in)
+		copy_sg_data(t->orig_sg + t->orig_out, t->orig_in,
+			     t->sg + (t->orig_out ? 2 : 0), 2);
+
+	data = t->orig_data;
+	kfree(t);
+	return data;
+}
+
+#else
+static bool torture_replace(struct scatterlist **sg,
+			     unsigned int *out,
+			     unsigned int *in,
+			     void **data,
+			     gfp_t gfp)
+{
+	return true;
+}
+
+static void *torture_done(void *data)
+{
+	return data;
+}
+#endif /* CONFIG_VIRTIO_DEVICE_TORTURE */
+
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
 			      struct scatterlist sg[],
@@ -213,6 +356,9 @@  int virtqueue_add_buf(struct virtqueue *_vq,
 
 	BUG_ON(data == NULL);
 
+	if (!torture_replace(&sg, &out, &in, &data, gfp))
+		return -ENOMEM;
+
 #ifdef DEBUG
 	{
 		ktime_t now = ktime_get();
@@ -246,6 +392,7 @@  int virtqueue_add_buf(struct virtqueue *_vq,
 		if (out)
 			vq->notify(&vq->vq);
 		END_USE(vq);
+		torture_done(data);
 		return -ENOSPC;
 	}
 
@@ -476,7 +623,7 @@  void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 #endif
 
 	END_USE(vq);
-	return ret;
+	return torture_done(ret);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);