diff mbox

[v3,7/9] vfio: Use driver_override to avert binding to compromising drivers

Message ID 20170620154830.17487.1861.stgit@gimli.home (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson June 20, 2017, 3:48 p.m. UTC
If a device is bound to a non-vfio, non-whitelisted driver while a
group is in use, then the integrity of the group is compromised and
will result in hitting a BUG_ON.  This code tries to avoid this case
by mangling driver_override to force a no-match for the driver.  The
driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
or BOUND_DRIVER, at which point we can remove the driver_override
mangling.

A complication here is that even though the window between these
notifications is expected to be extremely small, the vfio group could
be removed, which would prevent us from finding the group again to
remove the driver_override.  We therefore take a group reference when
adding to driver_override and release it when removed.  A second
complication is that driver_override can be modified by the system
admin through sysfs.  To avoid trivial interference, we add a non-
user-visible UUID to the group and use this as part of the mangle
string.

The above blocks binding to a driver that would compromise the host,
but we'd also like to avoid reaching that step if possible.  For this
we add a wait_event_timeout() with a short, 1 second timeout, which is
highly effective in allowing empty groups to finish cleanup.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio.c |  145 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 7 deletions(-)

Comments

Russell King (Oracle) June 26, 2017, 9:08 a.m. UTC | #1
On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> If a device is bound to a non-vfio, non-whitelisted driver while a
> group is in use, then the integrity of the group is compromised and
> will result in hitting a BUG_ON.  This code tries to avoid this case
> by mangling driver_override to force a no-match for the driver.  The
> driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> or BOUND_DRIVER, at which point we can remove the driver_override
> mangling.

Rather than mangling the driver override string to prevent driver binding,
I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
notifier to fail the device probe?

The driver override strings are, after all, exposed to userspace, and
it strikes me that this kind of mangling is racy - userspace can read
or change the override string at any time.
Alex Williamson June 26, 2017, 7:39 p.m. UTC | #2
On Mon, 26 Jun 2017 10:08:55 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > If a device is bound to a non-vfio, non-whitelisted driver while a
> > group is in use, then the integrity of the group is compromised and
> > will result in hitting a BUG_ON.  This code tries to avoid this case
> > by mangling driver_override to force a no-match for the driver.  The
> > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > or BOUND_DRIVER, at which point we can remove the driver_override
> > mangling.  
> 
> Rather than mangling the driver override string to prevent driver binding,
> I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> notifier to fail the device probe?
> 
> The driver override strings are, after all, exposed to userspace, and
> it strikes me that this kind of mangling is racy - userspace can read
> or change the override string at any time.

Indeed, that looks easier.  I sent and RFC, let's see what Greg has to
say.  Thanks,

Alex
Alex Williamson July 10, 2017, 9:34 p.m. UTC | #3
On Mon, 26 Jun 2017 10:08:55 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > If a device is bound to a non-vfio, non-whitelisted driver while a
> > group is in use, then the integrity of the group is compromised and
> > will result in hitting a BUG_ON.  This code tries to avoid this case
> > by mangling driver_override to force a no-match for the driver.  The
> > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > or BOUND_DRIVER, at which point we can remove the driver_override
> > mangling.  
> 
> Rather than mangling the driver override string to prevent driver binding,
> I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> notifier to fail the device probe?

Well, it seemed like a good idea, but I don't think we're getting any
traction here, the thread has gone cold:

https://lkml.org/lkml/2017/6/27/1002

Greg, any further comments?
 
> The driver override strings are, after all, exposed to userspace, and
> it strikes me that this kind of mangling is racy - userspace can read
> or change the override string at any time.

As an alternative, I think we can make this not racy.  BIND_DRIVER is
notified through device_bind_driver() which specifies that the device
lock is held.  This covers not only BIND_DRIVER, but also BOUND_DRIVER
and DRIVER_NOT_BOUND.  So if the user entry points in sysfs were to
require the device lock, we could easily mangle and de-mangle without
interference from a user.  It also seems like a rather good idea in
general to exclude the user from changing driver_override while we're
evaluating a match using it.  Do you still have an objection to
mangling driver_override if we can avoid the user race?  Thanks,

Alex
Greg KH July 11, 2017, 9:46 a.m. UTC | #4
On Mon, Jul 10, 2017 at 03:34:12PM -0600, Alex Williamson wrote:
> On Mon, 26 Jun 2017 10:08:55 +0100
> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
> 
> > On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > > If a device is bound to a non-vfio, non-whitelisted driver while a
> > > group is in use, then the integrity of the group is compromised and
> > > will result in hitting a BUG_ON.  This code tries to avoid this case
> > > by mangling driver_override to force a no-match for the driver.  The
> > > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > > or BOUND_DRIVER, at which point we can remove the driver_override
> > > mangling.  
> > 
> > Rather than mangling the driver override string to prevent driver binding,
> > I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> > notifier to fail the device probe?
> 
> Well, it seemed like a good idea, but I don't think we're getting any
> traction here, the thread has gone cold:
> 
> https://lkml.org/lkml/2017/6/27/1002
> 
> Greg, any further comments?

I still think your drivers should be fixed, adding
yet-another-odd-interaction with the driver core is ripe for added
complexity...

And, as there's no real patch for me to do anything with (hint, I can't
apply RFC patches), I don't know what I can do here...

thanks,

greg k-h
Alex Williamson July 11, 2017, 4:41 p.m. UTC | #5
On Tue, 11 Jul 2017 11:46:27 +0200
Greg KH <greg@kroah.com> wrote:

> On Mon, Jul 10, 2017 at 03:34:12PM -0600, Alex Williamson wrote:
> > On Mon, 26 Jun 2017 10:08:55 +0100
> > Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
> >   
> > > On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:  
> > > > If a device is bound to a non-vfio, non-whitelisted driver while a
> > > > group is in use, then the integrity of the group is compromised and
> > > > will result in hitting a BUG_ON.  This code tries to avoid this case
> > > > by mangling driver_override to force a no-match for the driver.  The
> > > > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > > > or BOUND_DRIVER, at which point we can remove the driver_override
> > > > mangling.    
> > > 
> > > Rather than mangling the driver override string to prevent driver binding,
> > > I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> > > notifier to fail the device probe?  
> > 
> > Well, it seemed like a good idea, but I don't think we're getting any
> > traction here, the thread has gone cold:
> > 
> > https://lkml.org/lkml/2017/6/27/1002
> > 
> > Greg, any further comments?  
> 
> I still think your drivers should be fixed, adding
> yet-another-odd-interaction with the driver core is ripe for added
> complexity...

Hi Greg,

Let me give a concrete scenario, I have a dual-port conventional PCI
e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
NIC functions are masked behind the requester ID of a PCIe-to-PCI
bridge.  We cannot have the e1000 driver managing one function and a
user managing the other (via vfio-pci).  In this case, not only is the
DMA not isolated but the functions share the same IOMMU context.
Therefore in order to allow the user access to one function via
vfio-pci, the other function needs to be in a known state, either also
bound to vfio-pci, bound to an innocuous driver like pci-stub, or
unbound from any driver.  Given this state, user now has access to one
function of the device, but how can we fix our driver to manage the
other function?

If the other function is also bound to vfio-pci, the driver core does
not allow us to refuse a driver remove request, the best we can do is
block for a while, but we best not do that too long so we end up in the
device unbound state.

Likewise, if the other function was bound to pci-stub, this driver won't
block remove, so the device for the other port can transition to an
unbound state.

Once in an unbound state, how would fixing either the vfio-pci or the
core vfio driver prevent the scenario which can now happen of the
unbound device being bound to the host e1000 driver?  This can happen
in pure PCIe topologies as well where perhaps the IOMMU context is not
shared, but the devices still lack DMA isolation within the group.

The only tool we currently have to manage this scenario is that the
vfio core driver can pull BUG_ON after the fact of the other device
being bound to a host driver.  Understandably, users aren't so keen on
this, which is why I'm trying to allow vfio to block binding of that
other device before it happens.  None of this really seems to fall
within the capabilities of the existing driver core, so simply fixing
my driver doesn't seem to be a well defined option.  Is there a simple
solution I'm missing?  We're not concerned only with auto-probing, we
need to protect against explicit bind attempts as well.
 
> And, as there's no real patch for me to do anything with (hint, I can't
> apply RFC patches), I don't know what I can do here...

Certainly continuing the discussion is all I'm asking for at this
point.  The RFC didn't tickle your fancy, but the reply also didn't
convey an appreciation of the circumstances.  I hope that perhaps this
gets us a step closer so we can decide which way to go.  Thanks,

Alex
Greg KH July 13, 2017, 8:23 a.m. UTC | #6
On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> Let me give a concrete scenario, I have a dual-port conventional PCI
> e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> NIC functions are masked behind the requester ID of a PCIe-to-PCI
> bridge.  We cannot have the e1000 driver managing one function and a
> user managing the other (via vfio-pci).

Agreed, but really, if a user asks to do such a thing, they deserve the
pieces the kernel ends up in, right?

> In this case, not only is the
> DMA not isolated but the functions share the same IOMMU context.
> Therefore in order to allow the user access to one function via
> vfio-pci, the other function needs to be in a known state, either also
> bound to vfio-pci, bound to an innocuous driver like pci-stub, or
> unbound from any driver.  Given this state, user now has access to one
> function of the device, but how can we fix our driver to manage the
> other function?

We have USB drivers that do this all the time, due to crazy USB specs
that required it.  The cdc-acm driver is one example, and I think there
are a number of USB sound devices that also have this issue.  Just have
the driver of the "first" device grab the second one as well and "know"
about the resources involved, as you are doing today.

But, then you somehow seem to have the requirement to prevent userspace
from mucking around in your driver bindings, and really, you shouldn't
care about this, because again, if it messes up here, all bets are off.

> If the other function is also bound to vfio-pci, the driver core does
> not allow us to refuse a driver remove request, the best we can do is
> block for a while, but we best not do that too long so we end up in the
> device unbound state.
> 
> Likewise, if the other function was bound to pci-stub, this driver won't
> block remove, so the device for the other port can transition to an
> unbound state.
> 
> Once in an unbound state, how would fixing either the vfio-pci or the
> core vfio driver prevent the scenario which can now happen of the
> unbound device being bound to the host e1000 driver?  This can happen
> in pure PCIe topologies as well where perhaps the IOMMU context is not
> shared, but the devices still lack DMA isolation within the group.
> 
> The only tool we currently have to manage this scenario is that the
> vfio core driver can pull BUG_ON after the fact of the other device
> being bound to a host driver.

Well, how about just locking the device down, don't crash the kernel.
That at least gives userspace a chance to figure out what they did was
wrong.

> Understandably, users aren't so keen on
> this, which is why I'm trying to allow vfio to block binding of that
> other device before it happens.  None of this really seems to fall
> within the capabilities of the existing driver core, so simply fixing
> my driver doesn't seem to be a well defined option.  Is there a simple
> solution I'm missing?  We're not concerned only with auto-probing, we
> need to protect against explicit bind attempts as well.

Again, if userspace does an explicit bind/unbind attempt, it _HAS_ to
know what it is doing, we can't protect ourselves from that, that's
always been the case.  The bind/unbind was done as a way for people to
say "I know more about what is going on than the kernel does right now,
so I'm going to override it." and we have to trust that they do know
that.  Don't spend a lot of time and energy trying to protect yourself
from that please.

thanks,

greg k-h
Alex Williamson July 14, 2017, 4:03 p.m. UTC | #7
Hi Greg,

On Thu, 13 Jul 2017 10:23:14 +0200
Greg KH <greg@kroah.com> wrote:

> On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> > Let me give a concrete scenario, I have a dual-port conventional PCI
> > e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> > NIC functions are masked behind the requester ID of a PCIe-to-PCI
> > bridge.  We cannot have the e1000 driver managing one function and a
> > user managing the other (via vfio-pci).  
> 
> Agreed, but really, if a user asks to do such a thing, they deserve the
> pieces the kernel ends up in, right?

I think that's asking a fair bit from users to understand these
nuances; at one point in time they can bind the device to e1000 and it
works, at another point in time the same operation crashes the system.
Perhaps the user is not even using manual binding, maybe the e1000
driver is freshly loaded and probes any devices that are not bound.
Maybe the device is passed through /sys/bus/pci/drivers_probe.

If the user attempts to bind a device to the wrong driver we don't
intentionally run the system into the ground to make that work.  Of
course if the user is overriding a match with dynamic IDs or
driver_override, then I fully agree, the user should be responsible for
the action they've dictated.  I tend to think of this more towards the
former than the latter.

> > In this case, not only is the
> > DMA not isolated but the functions share the same IOMMU context.
> > Therefore in order to allow the user access to one function via
> > vfio-pci, the other function needs to be in a known state, either also
> > bound to vfio-pci, bound to an innocuous driver like pci-stub, or
> > unbound from any driver.  Given this state, user now has access to one
> > function of the device, but how can we fix our driver to manage the
> > other function?  
> 
> We have USB drivers that do this all the time, due to crazy USB specs
> that required it.  The cdc-acm driver is one example, and I think there
> are a number of USB sound devices that also have this issue.  Just have
> the driver of the "first" device grab the second one as well and "know"
> about the resources involved, as you are doing today.
> 
> But, then you somehow seem to have the requirement to prevent userspace
> from mucking around in your driver bindings, and really, you shouldn't
> care about this, because again, if it messes up here, all bets are off.
> 
> > If the other function is also bound to vfio-pci, the driver core does
> > not allow us to refuse a driver remove request, the best we can do is
> > block for a while, but we best not do that too long so we end up in the
> > device unbound state.
> > 
> > Likewise, if the other function was bound to pci-stub, this driver won't
> > block remove, so the device for the other port can transition to an
> > unbound state.
> > 
> > Once in an unbound state, how would fixing either the vfio-pci or the
> > core vfio driver prevent the scenario which can now happen of the
> > unbound device being bound to the host e1000 driver?  This can happen
> > in pure PCIe topologies as well where perhaps the IOMMU context is not
> > shared, but the devices still lack DMA isolation within the group.
> > 
> > The only tool we currently have to manage this scenario is that the
> > vfio core driver can pull BUG_ON after the fact of the other device
> > being bound to a host driver.  
> 
> Well, how about just locking the device down, don't crash the kernel.
> That at least gives userspace a chance to figure out what they did was
> wrong.

Locking down the device is an interesting strategy, I'll look into this
more.  I think we'd be locking down the user/vfio owned device,
locking down the device for non-vfio use is essentially what I've been
trying to do with blocking driver binding.  With PCI devices we have the
bus-master bit that we could clear and prevent the user from changing to
theoretically stop all DMA for the device.  We'd also need to destroy
the IOMMU domain or else the IOMMU driver is going to break because the
wrong domain type is setup for the host owned device.  However, if we
manage to do all of this perfectly, the result would be that the user
owned device quietly stops working, perhaps only a dmesg log entry to
identify what happened.  I can imagine many bugs filed and much time
wasted looking for that magic breadcrumb.  Perhaps the better approach
is to try to request the conflicting device(s) back from the user, and
failing that kill the user process, ultimately falling back to the
existing BUG_ON if we haven't resolved the compromising scenario.  A
harsher approach, but there's certainly the argument that we're trying
to follow the user command of binding to the native host driver.
 
> > Understandably, users aren't so keen on
> > this, which is why I'm trying to allow vfio to block binding of that
> > other device before it happens.  None of this really seems to fall
> > within the capabilities of the existing driver core, so simply fixing
> > my driver doesn't seem to be a well defined option.  Is there a simple
> > solution I'm missing?  We're not concerned only with auto-probing, we
> > need to protect against explicit bind attempts as well.  
> 
> Again, if userspace does an explicit bind/unbind attempt, it _HAS_ to
> know what it is doing, we can't protect ourselves from that, that's
> always been the case.  The bind/unbind was done as a way for people to
> say "I know more about what is going on than the kernel does right now,
> so I'm going to override it." and we have to trust that they do know
> that.  Don't spend a lot of time and energy trying to protect yourself
> from that please.

Thanks for the advice, I appreciate your time on this.  I'd still like
to improve the current behavior, but I'll see what I can do on the side
of imposing user consequences before taking the head shot.  Thanks,

Alex
Greg KH July 14, 2017, 8:09 p.m. UTC | #8
On Fri, Jul 14, 2017 at 10:03:27AM -0600, Alex Williamson wrote:
> Hi Greg,
> 
> On Thu, 13 Jul 2017 10:23:14 +0200
> Greg KH <greg@kroah.com> wrote:
> 
> > On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> > > Let me give a concrete scenario, I have a dual-port conventional PCI
> > > e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> > > NIC functions are masked behind the requester ID of a PCIe-to-PCI
> > > bridge.  We cannot have the e1000 driver managing one function and a
> > > user managing the other (via vfio-pci).  
> > 
> > Agreed, but really, if a user asks to do such a thing, they deserve the
> > pieces the kernel ends up in, right?
> 
> I think that's asking a fair bit from users to understand these
> nuances;

There is no "nuance" here.

> at one point in time they can bind the device to e1000 and it
> works,

Yeah, they got lucky!

> at another point in time the same operation crashes the system.

And now they didn't!  What did they do differently?  Oh look, one other
device needed to be unbound/bound/whatever to get this to work properly,
let's do that instead next time.

> Perhaps the user is not even using manual binding, maybe the e1000
> driver is freshly loaded and probes any devices that are not bound.

No, that's not the issue here at all, that should always work as the
kernel driver is telling us that it can support this device just fine.
Nothing "grey" or "nuanced" here at all.

> Maybe the device is passed through /sys/bus/pci/drivers_probe.

Then the user gets to keep the pieces of the kernel that might get spit
out at them.

Again, doing manual binding is a risk that if a user takes, it might or
might not work.  It's always been that way.

> If the user attempts to bind a device to the wrong driver we don't
> intentionally run the system into the ground to make that work.

Are you kidding?  That happens all the time, try to do it yourself and
bind a device to a driver that doesn't expect to be handling it.  If you
are lucky your kernel will crash, if unlucky, it will limp along and do
odd things to the device.  It's been this way since these sysfs files
were added over a decade ago.

> Of course if the user is overriding a match with dynamic IDs or
> driver_override, then I fully agree, the user should be responsible
> for the action they've dictated.  I tend to think of this more towards
> the former than the latter.

The user is always responsible if they are using sysfs to bind/unbind
devices from drivers.  It's not complex or nuanced at all.

thanks,

greg k-h
diff mbox

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index f40d1508d368..20e57fecf652 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -25,6 +25,7 @@ 
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/platform_device.h>
 #include <linux/pci.h>
 #include <linux/rwsem.h>
 #include <linux/sched.h>
@@ -32,6 +33,7 @@ 
 #include <linux/stat.h>
 #include <linux/string.h>
 #include <linux/uaccess.h>
+#include <linux/uuid.h>
 #include <linux/vfio.h>
 #include <linux/wait.h>
 
@@ -95,6 +97,7 @@  struct vfio_group {
 	bool				noiommu;
 	struct kvm			*kvm;
 	struct blocking_notifier_head	notifier;
+	unsigned char			uuid[16];
 };
 
 struct vfio_device {
@@ -352,6 +355,8 @@  static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group)
 
 	group->nb.notifier_call = vfio_iommu_group_notifier;
 
+	generate_random_uuid(group->uuid);
+
 	/*
 	 * blocking notifiers acquire a rwsem around registering and hold
 	 * it around callback.  Therefore, need to register outside of
@@ -728,6 +733,111 @@  static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
 	return vfio_dev_viable(dev, group);
 }
 
+#define VFIO_TAG_PREFIX "#vfio_group:"
+
+static char **vfio_find_driver_override(struct device *dev)
+{
+	if (dev_is_pci(dev)) {
+		struct pci_dev *pdev = to_pci_dev(dev);
+		return &pdev->driver_override;
+	} else if (dev->bus == &platform_bus_type) {
+		struct platform_device *pdev = to_platform_device(dev);
+		return &pdev->driver_override;
+	}
+
+	return NULL;
+}
+
+/*
+ * If we're about to bind to something other than a known whitelisted driver
+ * or known vfio bus driver, try to avert it with driver_override.
+ */
+static void vfio_group_nb_pre_bind(struct vfio_group *group, struct device *dev)
+{
+	struct vfio_bus_driver *driver;
+	struct device_driver *drv = ACCESS_ONCE(dev->driver);
+	char **driver_override;
+
+	if (vfio_dev_whitelisted(dev, drv))
+		return; /* Binding to known "innocuous" device/driver */
+
+	mutex_lock(&vfio.bus_drivers_lock);
+	list_for_each_entry(driver, &vfio.bus_drivers_list, vfio_next) {
+		if (driver->drv == drv) {
+			mutex_unlock(&vfio.bus_drivers_lock);
+			return; /* Binding to known vfio bus driver, ok */
+		}
+	}
+	mutex_unlock(&vfio.bus_drivers_lock);
+
+	/* Can we stall slightly to let users fall off? */
+	if (list_empty(&group->device_list)) {
+		if (wait_event_timeout(vfio.release_q,
+				!atomic_read(&group->container_users), HZ))
+			return;
+	}
+
+	driver_override = vfio_find_driver_override(dev);
+	if (driver_override) {
+		char tag[50], *new = NULL, *old = *driver_override;
+
+		snprintf(tag, sizeof(tag), "%s%pU",
+			 VFIO_TAG_PREFIX, group->uuid);
+
+		if (old && strstr(old, tag))
+			return; /* block already in place */
+
+		new = kasprintf(GFP_KERNEL, "%s%s", old ? old : "", tag);
+		if (new) {
+			*driver_override = new;
+			kfree(old);
+			vfio_group_get(group);
+			dev_warn(dev, "vfio: Blocking unsafe driver bind\n");
+			return;
+		}
+	}
+
+	dev_warn(dev, "vfio: Unsafe driver binding to in-use group!\n");
+}
+
+/* If we've mangled driver_override, remove it */
+static void vfio_group_nb_post_bind(struct vfio_group *group,
+				    struct device *dev)
+{
+	char **driver_override = vfio_find_driver_override(dev);
+
+	if (driver_override && *driver_override) {
+		char tag[50], *new, *start, *end, *old = *driver_override;
+
+		snprintf(tag, sizeof(tag), "%s%pU",
+			 VFIO_TAG_PREFIX, group->uuid);
+
+		start = strstr(old, tag);
+		if (start) {
+			end = start + strlen(tag);
+
+			if (old + strlen(old) > end)
+				memmove(start, end,
+					strlen(old) - (end - old) + 1);
+			else
+				*start = 0;
+
+			if (strlen(old)) {
+				new = kasprintf(GFP_KERNEL, "%s", old);
+				if (new) {
+					*driver_override = new;
+					kfree(old);
+				} /* else, in-place terminated, ok */
+			} else {
+				*driver_override = NULL;
+				kfree(old);
+			}
+
+			vfio_group_put(group);
+		}
+	}
+}
+
 static int vfio_iommu_group_notifier(struct notifier_block *nb,
 				     unsigned long action, void *data)
 {
@@ -757,14 +867,23 @@  static int vfio_iommu_group_notifier(struct notifier_block *nb,
 		 */
 		break;
 	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		pr_debug("%s: Device %s, group %d binding to driver\n",
+		pr_debug("%s: Device %s, group %d binding to driver %s\n",
 			 __func__, dev_name(dev),
-			 iommu_group_id(group->iommu_group));
+			 iommu_group_id(group->iommu_group), dev->driver->name);
+		if (vfio_group_nb_verify(group, dev))
+			vfio_group_nb_pre_bind(group, dev);
+		break;
+	case IOMMU_GROUP_NOTIFY_DRIVER_NOT_BOUND:
+		pr_debug("%s: Device %s, group %d binding fail to driver %s\n",
+			 __func__, dev_name(dev),
+			 iommu_group_id(group->iommu_group), dev->driver->name);
+		vfio_group_nb_post_bind(group, dev);
 		break;
 	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
 		pr_debug("%s: Device %s, group %d bound to driver %s\n",
 			 __func__, dev_name(dev),
 			 iommu_group_id(group->iommu_group), dev->driver->name);
+		vfio_group_nb_post_bind(group, dev);
 		BUG_ON(vfio_group_nb_verify(group, dev));
 		break;
 	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
@@ -1351,6 +1470,7 @@  static int vfio_group_unset_container(struct vfio_group *group)
 	if (users != 1)
 		return -EBUSY;
 
+	wake_up(&vfio.release_q);
 	__vfio_group_unset_container(group);
 
 	return 0;
@@ -1364,7 +1484,11 @@  static int vfio_group_unset_container(struct vfio_group *group)
  */
 static void vfio_group_try_dissolve_container(struct vfio_group *group)
 {
-	if (0 == atomic_dec_if_positive(&group->container_users))
+	int users = atomic_dec_if_positive(&group->container_users);
+
+	wake_up(&vfio.release_q);
+
+	if (!users)
 		__vfio_group_unset_container(group);
 }
 
@@ -1433,19 +1557,26 @@  static bool vfio_group_viable(struct vfio_group *group)
 
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
+	int ret;
+
 	if (!atomic_inc_not_zero(&group->container_users))
 		return -EINVAL;
 
 	if (group->noiommu) {
-		atomic_dec(&group->container_users);
-		return -EPERM;
+		ret = -EPERM;
+		goto out;
 	}
 	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
-		atomic_dec(&group->container_users);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
 	return 0;
+
+out:
+	atomic_dec(&group->container_users);
+	wake_up(&vfio.release_q);
+	return ret;
 }
 
 static const struct file_operations vfio_device_fops;