diff mbox

[3/4] qemu:virtio-net: Add support for qemu_vlan_rxfilter

Message ID 20090210212857.9760.31288.stgit@kvm.aw (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Alex Williamson Feb. 10, 2009, 9:28 p.m. UTC
Make use of qemu_vlan_rxfilter so that we can filter at a lower
level.  We implement callbacks for devices being added and removed
so that we can fall back to our own filtering or make another attempt
to push filtering off to someone else.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
---

 hw/virtio-net.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 54 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paul Brook Feb. 12, 2009, 4:26 p.m. UTC | #1
> +static void virtio_net_vlan_client_added(void *opaque)
>...
> +static void virtio_net_vlan_client_removed(void *opaque)

Why are these two different?

It looks like what you really want is a callback for "Something changed,
and you need to reset your MAC filter."

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 12, 2009, 4:36 p.m. UTC | #2
On Thu, 2009-02-12 at 16:26 +0000, Paul Brook wrote:
> > +static void virtio_net_vlan_client_added(void *opaque)
> >...
> > +static void virtio_net_vlan_client_removed(void *opaque)
> 
> Why are these two different?
> 
> It looks like what you really want is a callback for "Something changed,
> and you need to reset your MAC filter."

I think we'd end up with a race if we only had one callback.  For
instance if "change" was the result of a vlan client being removed, the
tap would clear the filter and the nic would try to install a filter.
The results would be different based on the calling order.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook Feb. 12, 2009, 5:05 p.m. UTC | #3
On Thursday 12 February 2009, Alex Williamson wrote:
> On Thu, 2009-02-12 at 16:26 +0000, Paul Brook wrote:
> > > +static void virtio_net_vlan_client_added(void *opaque)
> > >...
> > > +static void virtio_net_vlan_client_removed(void *opaque)
> >
> > Why are these two different?
> >
> > It looks like what you really want is a callback for "Something changed,
> > and you need to reset your MAC filter."
>
> I think we'd end up with a race if we only had one callback.  For
> instance if "change" was the result of a vlan client being removed, the
> tap would clear the filter and the nic would try to install a filter.
> The results would be different based on the calling order.

In that case either your implementation or your abstraction is wrong. 
Devices shouldn't need to know or care why a filter failed to apply.

I'm not sure what you mean by "the tap would clear the filter".

You have three options:

- Devices request a filter, and that request may fail. qemu notifies the 
device when it needs to retry the filter. It doesn't make any difference 
whether we think we just broke the old filter, or we think a previously 
failed filter may succeed now, the device response is the same: Try to set a 
filter and see if it works. This is what I assume you're trying to implement.
- Implement reliable filters. The device requests a filter once[1]. qemu makes 
sure that filter is always honored.
- Devices request a filter once. qemu remembers what that filter is, and 
notifies the device if/when that filter becomes active/inactive.

Paul

[1] once == every time the filter set changes.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 12, 2009, 6:21 p.m. UTC | #4
On Thu, 2009-02-12 at 17:05 +0000, Paul Brook wrote:
> On Thursday 12 February 2009, Alex Williamson wrote:
> > On Thu, 2009-02-12 at 16:26 +0000, Paul Brook wrote:
> > > > +static void virtio_net_vlan_client_added(void *opaque)
> > > >...
> > > > +static void virtio_net_vlan_client_removed(void *opaque)
> > >
> > > Why are these two different?
> > >
> > > It looks like what you really want is a callback for "Something changed,
> > > and you need to reset your MAC filter."
> >
> > I think we'd end up with a race if we only had one callback.  For
> > instance if "change" was the result of a vlan client being removed, the
> > tap would clear the filter and the nic would try to install a filter.
> > The results would be different based on the calling order.
> 
> In that case either your implementation or your abstraction is wrong. 
> Devices shouldn't need to know or care why a filter failed to apply.

They don't, but they do need to know whether a filter they previously
applied successfully is no longer in place.  If they don't know this,
they have to assume the filter on the other side of the vlan is
transient and always double check it with their own filtering, which
seems like a waste of cache and cycles.

Is your objection that the NIC needs to associate the change in filter
state with a vlan event?  If so, maybe we can re-architect the callbacks
to something more appropriate.  We would need both a filter_cleared()
and a filter_retry() to achieve the same mechanics.  filter_cleared()
feels like something the tap device (the "filterer") would call on the
NIC (the "filteree").  But then the "filterer" needs some callback to
know when to clear and notify, which gets us back to somebody needs to
associate a vlan event with a change in the filter.  Suggestions?

> I'm not sure what you mean by "the tap would clear the filter".

By clear, I mean disable the filter, bringing the vlan back to an
unfiltered state.

> You have three options:
> 
> - Devices request a filter, and that request may fail. qemu notifies the 
> device when it needs to retry the filter. It doesn't make any difference 
> whether we think we just broke the old filter, or we think a previously 
> failed filter may succeed now, the device response is the same: Try to set a 
> filter and see if it works. This is what I assume you're trying to implement.
> - Implement reliable filters. The device requests a filter once[1]. qemu makes 
> sure that filter is always honored.
> - Devices request a filter once. qemu remembers what that filter is, and 
> notifies the device if/when that filter becomes active/inactive.

None of these are complete solutions.  My code behaves like this:

- A device requests a filter and is told if the request is successful
  - On success the device may skip it's own filtering
- If another vlan client is added, the following _must_ occur:
  - The "filterer" must clear (remove) the filter
  - The "filteree" must revert to using their own filtering
- If a vlan client is removed, the following _may_ occur:
  - vlan clients are notified that they may retry their filter

The added()/removed() interface feels like the right solution to this.
We could use a changed() function, but it would need to know the
direction of the change, which leads back to the same mechanics.  If
there's a better way, please suggest it.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier Feb. 12, 2009, 8:26 p.m. UTC | #5
Alex Williamson wrote:
> They don't, but they do need to know whether a filter they previously
> applied successfully is no longer in place.  If they don't know this,
> they have to assume the filter on the other side of the vlan is
> transient and always double check it with their own filtering, which
> seems like a waste of cache and cycles.

Double checking is not a waste of cycles if the host filtering is good
but not guaranteed, as it still means fewer packets cross the host
kernel/user boundary.

How reliable are the host filter interfaces anyway?

    - Are they 100% reliable?  For example, I remember at one time on
      Linux, when application A listened for broadcast IP packets, you
      wouldn't receive any which has a unicast MAC address for a
      different host, except for the times when tcpdump was run in
      parallel.  That's an example of when the app sees the host filtering
      behave differently depending on other apps run at the same time.
      Multicast hash filtering is similar, but also depends on the host NIC.

    - When updated, do the host filters take effect from the instant
      the host syscall returned, or can there be already queued
      packets which bypass the new filter?

> - A device requests a filter and is told if the request is successful
>   - On success the device may skip it's own filtering
> - If another vlan client is added, the following _must_ occur:
>   - The "filterer" must clear (remove) the filter
>   - The "filteree" must revert to using their own filtering
> - If a vlan client is removed, the following _may_ occur:
>   - vlan clients are notified that they may retry their filter
> 
> The added()/removed() interface feels like the right solution to this.
> We could use a changed() function, but it would need to know the
> direction of the change, which leads back to the same mechanics.  If
> there's a better way, please suggest it.  Thanks,

If there are two vlan clients, you don't necessarily need them to use
their own filters, if they are both the same, or compatible in some way.

How about this:

   - On any change, notify all clients by walking the list twice.
   - Phase 1.  For each client:
     - The client requests a filter.
   - Phase 2.  For each client:
     - The client enquires whether its filter is in place.
       If yes, it relies on it.  If no or unreliable, it filters itself.

During this procedure, don't do any packet I/O.

During phase 1, don't send a request to the host kernel until just one
at the end (if the filter has changed), otherwise there will be a
brief time window with no host filter which could leak packets despite
no real change wanted and no packet I/O performed.

The same procedure is done for any configuration change, and together
with that change.  Actually: Stop packet I/O, then phase 1, then
update host kernel taps if they've changed, then phase 2, then restart
packet I/O.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook Feb. 13, 2009, 12:40 p.m. UTC | #6
> - A device requests a filter and is told if the request is successful
>   - On success the device may skip it's own filtering
> - If another vlan client is added, the following _must_ occur:
>   - The "filterer" must clear (remove) the filter
>   - The "filteree" must revert to using their own filtering
> - If a vlan client is removed, the following _may_ occur:
>   - vlan clients are notified that they may retry their filter
>..
> The added()/removed() interface feels like the right solution to this.

I think your analysis is fundamentally flawed. Firstly I'm not sure the above 
holds when going between 1 and 2 clients on a vlan.  Even ignoring that, you 
are making implicit assumptions about when a filter will succeed. If these 
assumptions are broken (which seems likely if we ever implement filtering 
with more than 2 devices on a vlan) they you'll get subtle breakage in every 
single user of the API.

> We could use a changed() function, but it would need to know the
> direction of the change, which leads back to the same mechanics.  If
> there's a better way, please suggest it.  Thanks,

I still don't see why the device needs to know what's changed. The response 
should always be the same: Request a filter, and see if it works.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier Feb. 13, 2009, 4 p.m. UTC | #7
Paul Brook wrote:
> > We could use a changed() function, but it would need to know the
> > direction of the change, which leads back to the same mechanics.  If
> > there's a better way, please suggest it.  Thanks,
> 
> I still don't see why the device needs to know what's changed. The response 
> should always be the same: Request a filter, and see if it works.

Well, you do need some way to notify a client that the filter which
used to work has been removed because it's no longer available, for
example when migrating between host kernels.

Or implement reliable filtering in the infrastructure which relays
packets internally in QEMU, so that each client can request a filter
and it always works, and the tap driver can tell the QEMU
infrastructure when kernel filtering is working and not, but each
client doesn't need to know that.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook Feb. 13, 2009, 4:17 p.m. UTC | #8
On Friday 13 February 2009, Jamie Lokier wrote:
> Paul Brook wrote:
> > > We could use a changed() function, but it would need to know the
> > > direction of the change, which leads back to the same mechanics.  If
> > > there's a better way, please suggest it.  Thanks,
> >
> > I still don't see why the device needs to know what's changed. The
> > response should always be the same: Request a filter, and see if it
> > works.
>
> Well, you do need some way to notify a client that the filter which
> used to work has been removed because it's no longer available, for
> example when migrating between host kernels.

This is actually more evidence that the "add" and "remove" callbacks are 
bogus.

My point is that we're allowing the filter to fail for arbitrary reasons. Once 
you assume this, trying to be clever is just going to catch you out when we 
encounter a case you didn't think of.

A simple "Something changed, please try your filter again" callback 
automatically covers all these cases.

It may be that in some cases qemu already knows the filter is going to fail. 
However these events are rare (i.e. not performance critical) so it's far 
simpler to just use the same callback and have the device try anyway.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier Feb. 13, 2009, 4:46 p.m. UTC | #9
Paul Brook wrote:
> > Well, you do need some way to notify a client that the filter which
> > used to work has been removed because it's no longer available, for
> > example when migrating between host kernels.
> 
> This is actually more evidence that the "add" and "remove" callbacks are 
> bogus.

I agree, "add" and "remove" are misnamed among other bogosities.

> My point is that we're allowing the filter to fail for arbitrary
> reasons. Once you assume this, trying to be clever is just going to
> catch you out when we encounter a case you didn't think of.

Yes.

> A simple "Something changed, please try your filter again" callback 
> automatically covers all these cases.

It doesn't, if tap has no memory of how many clients require a filter.

If tap just answers "YES and installs the kernel filter" or "NO and
doesn't install the kernel filter" and doesn't remember how many
clients need a filter, then:

    1. Client A requests filter A
         -> tap says YES, installs kernel filter.
    2. Client B requests filter B
         -> tap can't do both, so deinstalls kernel filter.
         -> tap says NO to client B.
         -> tap notifies client A "there's been a change".
    3. Client A requests filter A
         -> tap doesn't remember that client B wants a filter, so
         -> tap says YES.
         -> tap notifies client B "there's been a change".
    4. Goto 2
         -> endless loop.

In other words, tap needs to distinguish three states:

     "1 filter requested and installed in the kernel"
     ">1 filter requested, none installed in the kernel"
     "0 filters requested, none installed in the kernel"

The interface to clients needs to keep that state updated in tap
somehow.  Just requesting a filter and reverting to software filtering
when the request failed doesn't do that.

Note this has nothing to do with the software filtering fallback.
Even if qemu internal filtering is always done, you still need the
above to keep the kernel filter correct.

Imho, since there needs to be software filter code for fallback
_anyway_, a better place to put that fallback is in the
client-independent plumbing which relays packets to each client.  And
if it's there, you can have a different software fallback filter for
each client anyway, so clients can assume filters are always installed
and working perfectly.

The interface between the plumbing and the tap driver still needs to
handle the issues described above.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook Feb. 13, 2009, 5:04 p.m. UTC | #10
> > A simple "Something changed, please try your filter again" callback
> > automatically covers all these cases.
>
> It doesn't, if tap has no memory of how many clients require a filter.

I'm talking about a callback for devices requesting an inbound filter. Devices 
implementing outgoing (device->vlan) filters just do as they're told.

> If tap just answers "YES and installs the kernel filter" or "NO and
> doesn't install the kernel filter" and doesn't remember how many
> clients need a filter, then:
>...
> In other words, tap needs to distinguish three states:
>
>      "1 filter requested and installed in the kernel"
>      ">1 filter requested, none installed in the kernel"
>      "0 filters requested, none installed in the kernel"

Absolutely not. This is the reason we have separate the "request an incoming 
filter" API from the "provide an outgoing filter" callback. It allows the 
vlan code to arbitrate in the middle. A vlan is a bus network, not a set of 
point-point connections. I haven't checked whether the proposed patch gets 
this right. I suspect it probably doesn't.

This is why the initial patch that had clients talking to each other directly 
was completely wrong.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier Feb. 13, 2009, 8:38 p.m. UTC | #11
Paul Brook wrote:
> > > A simple "Something changed, please try your filter again" callback
> > > automatically covers all these cases.
> >
> > It doesn't, if tap has no memory of how many clients require a filter.
> 
> I'm talking about a callback for devices requesting an inbound
> filter. Devices implementing outgoing (device->vlan) filters just do
> as they're told.

So am I.

> > If tap just answers "YES and installs the kernel filter" or "NO and
> > doesn't install the kernel filter" and doesn't remember how many
> > clients need a filter, then:
> >...
> > In other words, tap needs to distinguish three states:
> >
> >      "1 filter requested and installed in the kernel"
> >      ">1 filter requested, none installed in the kernel"
> >      "0 filters requested, none installed in the kernel"
> 
> Absolutely not. This is the reason we have separate the "request an incoming 
> filter" API from the "provide an outgoing filter" callback. It allows the 
> vlan code to arbitrate in the middle.

I'm guessing that "vlan code to arbitrate in the middle" is exactly
what I've suggested.

Maybe the description was messed up.  Substitute something else for
"tap" in the above: "The _vlan arbitration code which talks to the tap
device_ needs to distinguish three states...".

> A vlan is a bus network, not a set of point-point connections.

Yes, this is what I've assumed.

The callback you suggest for devices requesting an inbound filter will
infinite-loop when there's two such devices on the same vlan bus,
because each time the callback is called, that device will re-issue
its filter request which triggers the callback on the other similar
device.  Back and forth.

To avoid the infinite loop, the vlan code in the middle (if that's
where you want it, and I agree) has to distinguish between no inbound
filters requested by attached devices, and multiple incompatible
inbound filters requested by attached devices.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook Feb. 15, 2009, 4:25 p.m. UTC | #12
> The callback you suggest for devices requesting an inbound filter will
> infinite-loop when there's two such devices on the same vlan bus,
> because each time the callback is called, that device will re-issue
> its filter request which triggers the callback on the other similar
> device.  Back and forth.
>
> To avoid the infinite loop, the vlan code in the middle (if that's
> where you want it, and I agree) has to distinguish between no inbound
> filters requested by attached devices, and multiple incompatible
> inbound filters requested by attached devices.

Of course.

As I've said repeatedly, the only sane way to implement this is if you isolate 
individual devices from this kind of implementation detail. If you're API 
doesn't do that then it's wrong.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 62153e9..3e4f526 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -35,6 +35,7 @@  typedef struct VirtIONet
     int mergeable_rx_bufs;
     int promisc;
     int allmulti;
+    int vlan_rxfilter;
     struct {
         int in_use;
         uint8_t *macs;
@@ -51,6 +52,43 @@  static VirtIONet *to_virtio_net(VirtIODevice *vdev)
     return (VirtIONet *)vdev;
 }
 
+static void virtio_net_set_vlan_rxfilter(VirtIONet *n)
+{
+    static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+    uint8_t *buf;
+    int flags = 0;
+
+    if (n->promisc)
+        flags |= QEMU_NET_PROMISC;
+    if (n->allmulti)
+        flags |= QEMU_NET_ALLMULTI;
+
+    buf = qemu_mallocz((n->mac_table.in_use + 2) * ETH_ALEN);
+
+    memcpy(&buf[ETH_ALEN*0], n->mac, ETH_ALEN);
+    memcpy(&buf[ETH_ALEN*1], n->mac_table.macs, n->mac_table.in_use * ETH_ALEN);
+    /* broadcast is a multicast addr, so add it at the end */
+    memcpy(&buf[ETH_ALEN*(n->mac_table.in_use + 1)], bcast, ETH_ALEN);
+
+    n->vlan_rxfilter = qemu_vlan_rxfilter(n->vc, flags,
+                                          n->mac_table.in_use + 2, buf);
+    qemu_free(buf);
+}
+
+static void virtio_net_vlan_client_added(void *opaque)
+{
+    VirtIONet *n = opaque;
+
+    n->vlan_rxfilter = 0;
+}
+
+static void virtio_net_vlan_client_removed(void *opaque)
+{
+    VirtIONet *n = opaque;
+
+    virtio_net_set_vlan_rxfilter(n);
+}
+
 static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VirtIONet *n = to_virtio_net(vdev);
@@ -70,6 +108,7 @@  static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
 
     if (memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
         memcpy(n->mac, netcfg.mac, ETH_ALEN);
+        virtio_net_set_vlan_rxfilter(n);
         qemu_format_nic_info_str(n->vc, n->mac);
     }
 }
@@ -99,6 +138,7 @@  static void virtio_net_reset(VirtIODevice *vdev)
     /* Flush any MAC and VLAN filter table state */
     n->mac_table.in_use = 0;
     memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+    virtio_net_set_vlan_rxfilter(n);
     memset(n->vlans, 0, MAX_VLAN >> 3);
 }
 
@@ -247,6 +287,10 @@  static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 
         virtqueue_push(vq, &elem, sizeof(status));
         virtio_notify(vdev, vq);
+
+        if (ctrl.class == VIRTIO_NET_CTRL_RX_MODE ||
+            ctrl.class == VIRTIO_NET_CTRL_MAC)
+            virtio_net_set_vlan_rxfilter(n);
     }
 }
 
@@ -334,15 +378,19 @@  static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
             return 0;
     }
 
-    if ((ptr[0] & 1) && n->allmulti)
+    /* If hw filtering is complete, no need to continue */
+    if (n->vlan_rxfilter)
         return 1;
 
-    if (!memcmp(ptr, bcast, sizeof(bcast)))
+    if ((ptr[0] & 1) && n->allmulti)
         return 1;
 
     if (!memcmp(ptr, n->mac, ETH_ALEN))
         return 1;
 
+    if (!memcmp(ptr, bcast, sizeof(bcast)))
+        return 1;
+
     for (i = 0; i < n->mac_table.in_use; i++) {
         if (!memcmp(ptr, &n->mac_table.macs[i * ETH_ALEN], ETH_ALEN))
             return 1;
@@ -552,6 +600,8 @@  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
     if (version_id >= 6)
         qemu_get_buffer(f, (uint8_t *)n->vlans, MAX_VLAN >> 3);
 
+    virtio_net_set_vlan_rxfilter(n);
+
     if (n->tx_timer_active) {
         qemu_mod_timer(n->tx_timer,
                        qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
@@ -589,6 +639,8 @@  void virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
     n->vc = qemu_new_vlan_client(nd->vlan, nd->model, nd->name,
                                  virtio_net_receive, virtio_net_can_receive, n);
     n->vc->link_status_changed = virtio_net_set_link_status;
+    n->vc->vlan_client_added = virtio_net_vlan_client_added;
+    n->vc->vlan_client_removed = virtio_net_vlan_client_removed;
 
     qemu_format_nic_info_str(n->vc, n->mac);