diff mbox

[2/5] virtio_net: Add a virtqueue for outbound control commands

Message ID 20090116211323.22836.40477.stgit@debian.lart (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Alex Williamson Jan. 16, 2009, 9:13 p.m. UTC
This will be used for RX mode, MAC filter table, VLAN filtering, etc...

The control transaction consists of one or more "out" sg entries and
one or more "in" sg entries.  The first out entry contains a header
defining the class and command.  Additional out entries may provide
data for the command.  The last in entry provides a status response
back from the command.

Virtqueues typically run asynchronous, running a callback function
when there's data in the channel.  We can't readily make use of this
in the command paths where we need to use this.  Instead, we kick
the virtqueue and spin.  The kick causes an I/O write, triggering an
immediate trap into the hypervisor.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
---

 drivers/net/virtio_net.c   |   49 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_net.h |   17 +++++++++++++++
 2 files changed, 65 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mark McLoughlin Jan. 19, 2009, 9:32 a.m. UTC | #1
On Fri, 2009-01-16 at 14:13 -0700, Alex Williamson wrote:
> This will be used for RX mode, MAC filter table, VLAN filtering, etc...
> 
> The control transaction consists of one or more "out" sg entries and
> one or more "in" sg entries.  The first out entry contains a header
> defining the class and command.  Additional out entries may provide
> data for the command.  The last in entry provides a status response
> back from the command.
> 
> Virtqueues typically run asynchronous, running a callback function
> when there's data in the channel.  We can't readily make use of this
> in the command paths where we need to use this.  Instead, we kick
> the virtqueue and spin.  The kick causes an I/O write, triggering an
> immediate trap into the hypervisor.
> 
> Signed-off-by: Alex Williamson <alex.williamson@hp.com>

Acked-by: Mark McLoughlin <markmc@redhat.com>

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 27, 2009, 4 a.m. UTC | #2
Hi Rusty,

On Tue, 2009-01-27 at 13:52 +1030, Rusty Russell wrote:
> On Saturday 17 January 2009 07:43:23 Alex Williamson wrote:
> > + return status ? -EFAULT : 0;
> 
> This is wrong. Currently this can't happen, right? But you put it in
> so someone in future may want some kind of return from the commands.
> In which case, they'll want to see the value.
> 
> If we're sure they never want to see the value, then we don't need to
> be synchronous at all: just spin if add_buf() fails.

In my qemu series of patches it can happen if the command isn't properly
defined, something bad happens, or the command is unrecognized.  As I
was hashing this out, I first had more errnos, but I wasn't sure how
extensively to fill out the error returns, and eventually defaulted to
ok/fail.  Should I expand on these some?  Suggestions for a reasonable
small yet complete set?  Should we use Linux errno values and let other
OS virtio-net implementations create a switch table?

I would like to keep this interface synchronous, particularly I'm
wondering if there's anything we might want to do for ethtool like
statistics.  In that case, the backend might fill a buffer of data along
with returning a status code.  I could imagine other similar uses as
well.

> 
> > +struct virtio_net_ctrl_hdr {
> 
> > + __u8 class;
> 
> > + __u8 cmd;
> 
> > +};
> 
> This would need to be __attribute__((packed)). On ARM, that struct
> would be 4 bytes long.

Thanks, I'll fix that.

> > +
> 
> > +typedef __u8 virtio_net_ctrl_ack;
> 
> > +
> 
> > +#define VIRTIO_NET_OK 0
> 
> > +#define VIRTIO_NET_ERR 1
> 
> Hmm, we define it and don't use it. And we never expect it to actually
> error (did your qemu implementation ever actually return non-zero?).

Yup, good point.  These are mainly here to stay in sync with the qemu
backend, which does make use of them.  Should I remove them here, or
should we make a more worthwhile set of return values?  I have tried
manually returning non-zero status from qemu to verify it's reflected in
the response.  Thanks for the comments,

Alex
Rusty Russell Jan. 28, 2009, 1:05 p.m. UTC | #3
On Tuesday 27 January 2009 14:30:06 Alex Williamson wrote:
> Hi Rusty,
> 
> On Tue, 2009-01-27 at 13:52 +1030, Rusty Russell wrote:
> > On Saturday 17 January 2009 07:43:23 Alex Williamson wrote:
> > > + return status ? -EFAULT : 0;
> > 
> > This is wrong. Currently this can't happen, right? But you put it in
> > so someone in future may want some kind of return from the commands.
> > In which case, they'll want to see the value.
> > 
> > If we're sure they never want to see the value, then we don't need to
> > be synchronous at all: just spin if add_buf() fails.
> 
> In my qemu series of patches it can happen if the command isn't properly
> defined

ie. guest bug.

> , something bad happens

???  You're going to tell me to read the code, aren't you? :)

> , or the command is unrecognized.

If we go for feature bits, this is also a guest bug.  And I think we should, since that's what feature bits are for.

> As I
> was hashing this out, I first had more errnos, but I wasn't sure how
> extensively to fill out the error returns, and eventually defaulted to
> ok/fail.  Should I expand on these some?  Suggestions for a reasonable
> small yet complete set?  Should we use Linux errno values and let other
> OS virtio-net implementations create a switch table?

Not error codes.  In future we may have a command where return codes are meaningful, but I'm pretty sure they'll be command specific so we don't know how to define them now anyway.  AFAICT that's the only real justification for defining this interface as sync.

So I think just defining 0 on success is fine.  Defining that to always happen to well-formed commands is even better :)

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 28, 2009, 7:02 p.m. UTC | #4
On Wed, 2009-01-28 at 23:35 +1030, Rusty Russell wrote:
> On Tuesday 27 January 2009 14:30:06 Alex Williamson wrote:
> > Hi Rusty,
> > 
> > On Tue, 2009-01-27 at 13:52 +1030, Rusty Russell wrote:
> > > On Saturday 17 January 2009 07:43:23 Alex Williamson wrote:
> > > > + return status ? -EFAULT : 0;
> > > 
> > > This is wrong. Currently this can't happen, right? But you put it in
> > > so someone in future may want some kind of return from the commands.
> > > In which case, they'll want to see the value.
> >
> > > If we're sure they never want to see the value, then we don't need to
> > > be synchronous at all: just spin if add_buf() fails.
> > 
> > In my qemu series of patches it can happen if the command isn't properly
> > defined
> 
> ie. guest bug.
> 
> > , something bad happens
> 
> ???  You're going to tell me to read the code, aren't you? :)

The only one of these that stands out is if the qemu_mallocz() for the
MAC filter table fails for a size we think is reasonable.

> > , or the command is unrecognized.
> 
> If we go for feature bits, this is also a guest bug.  And I think we
> should, since that's what feature bits are for.

One of the reasons I had avoided using a feature bit is that it's only a
32bit field, and we've already used up to bit 16 (though I'm not sure
what to do about the sparse-ness of it).  The virtqueue interface I've
designed supports up to 255 command classes, each with 255 commands.  We
can't add a feature bit for every one, or even much of a subset.  I'd be
happy to add a feature bit indicating the virtqueue command channel is
supported, but beyond that, I think we need to fall back to enable or
initialization commands failing on various classes.  We could also
define that command 0 for each class as a probe if we want to make it
more explicit.  Thanks,

Alex
Rusty Russell Jan. 29, 2009, 1:35 a.m. UTC | #5
On Thursday 29 January 2009 05:32:21 Alex Williamson wrote:
> On Wed, 2009-01-28 at 23:35 +1030, Rusty Russell wrote:
> > On Tuesday 27 January 2009 14:30:06 Alex Williamson wrote:
> > > On Tue, 2009-01-27 at 13:52 +1030, Rusty Russell wrote:
> > > > If we're sure they never want to see the value, then we don't need to
> > > > be synchronous at all: just spin if add_buf() fails.
...
> The only one of these that stands out is if the qemu_mallocz() for the
> MAC filter table fails for a size we think is reasonable.

Yes, that seems fair.  Of course it's quite probable that the malloc will "succeed" then we'll segv later on if we're really low on mem.

> > > , or the command is unrecognized.
> > 
> > If we go for feature bits, this is also a guest bug.  And I think we
> > should, since that's what feature bits are for.
> 
> One of the reasons I had avoided using a feature bit is that it's only a
> 32bit field, and we've already used up to bit 16 (though I'm not sure
> what to do about the sparse-ness of it).  The virtqueue interface I've
> designed supports up to 255 command classes, each with 255 commands.  We
> can't add a feature bit for every one, or even much of a subset.

We will need to figure out how to figure out how to expand the feature bits for virtio_pci anyway.  I think the plan is to use feature bit 31 to mean "look here for more feature bits".  Anthony would know more...

s390 and lguest didn't fall into this trap; we have infinite feature bits.

> I'd be
> happy to add a feature bit indicating the virtqueue command channel is
> supported

We probably should, for good measure.  If someone added another virtqueue for some other purpose in future and didn't want to support the command channel it would get icky.

If we have 255 features, the problem is that we have 255 features, not that it's a lot of bits :)

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e7700de..d4be0a2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -39,7 +39,7 @@  module_param(gso, bool, 0444);
 struct virtnet_info
 {
 	struct virtio_device *vdev;
-	struct virtqueue *rvq, *svq;
+	struct virtqueue *rvq, *svq, *cvq;
 	struct net_device *dev;
 	struct napi_struct napi;
 
@@ -603,6 +603,41 @@  static int virtnet_open(struct net_device *dev)
 	return 0;
 }
 
+static int virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
+				void *data, unsigned int len)
+{
+	struct scatterlist sg[3];
+	struct virtio_net_ctrl_hdr ctrl;
+	virtio_net_ctrl_ack status;
+	unsigned int tmp;
+	int i = 0;
+
+	if (!vi->cvq)
+		return -EIO;
+
+	sg_init_table(sg, len ? 3 : 2);
+
+	sg_set_buf(&sg[i++], &ctrl, sizeof(ctrl));
+	if (len)
+		sg_set_buf(&sg[i++], data, len);
+	sg_set_buf(&sg[i], &status, sizeof(status));
+
+	ctrl.class = class;
+	ctrl.cmd = cmd;
+
+	status = ~0;
+
+	if (vi->cvq->vq_ops->add_buf(vi->cvq, sg, i, 1, vi) != 0)
+		BUG();
+
+	vi->cvq->vq_ops->kick(vi->cvq);
+
+	while (!vi->cvq->vq_ops->get_buf(vi->cvq, &tmp))
+		cpu_relax();
+
+	return status ? -EFAULT : 0;
+}
+
 static int virtnet_close(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -733,6 +768,14 @@  static int virtnet_probe(struct virtio_device *vdev)
 		goto free_recv;
 	}
 
+	/*
+	 * Outbound control channel virtqueue.  We can live without it,
+	 * so don't go fatal if it's not there.
+	 */
+	vi->cvq = vdev->config->find_vq(vdev, 2, NULL);
+	if (IS_ERR(vi->cvq))
+		vi->cvq = NULL;
+
 	/* Initialize our empty receive and send queues. */
 	skb_queue_head_init(&vi->recv);
 	skb_queue_head_init(&vi->send);
@@ -763,6 +806,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 unregister:
 	unregister_netdev(dev);
 free_send:
+	if (vi->cvq)
+		vdev->config->del_vq(vi->cvq);
 	vdev->config->del_vq(vi->svq);
 free_recv:
 	vdev->config->del_vq(vi->rvq);
@@ -793,6 +838,8 @@  static void virtnet_remove(struct virtio_device *vdev)
 
 	vdev->config->del_vq(vi->svq);
 	vdev->config->del_vq(vi->rvq);
+	if (vi->cvq)
+		vdev->config->del_vq(vi->cvq);
 	unregister_netdev(vi->dev);
 
 	while (vi->pages)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 5cdd0aa..e2c1d81 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -53,4 +53,21 @@  struct virtio_net_hdr_mrg_rxbuf {
 	__u16 num_buffers;	/* Number of merged rx buffers */
 };
 
+/*
+ * Control virtqueue data structures
+ *
+ * The control virtqueue expects a header in the first sg entry
+ * and an ack/status response in the last entry.  Data for the
+ * command goes in between.
+ */
+struct virtio_net_ctrl_hdr {
+	__u8 class;
+	__u8 cmd;
+};
+
+typedef __u8 virtio_net_ctrl_ack;
+
+#define VIRTIO_NET_OK     0
+#define VIRTIO_NET_ERR    1
+
 #endif /* _LINUX_VIRTIO_NET_H */