Message ID | 20200407192015.19887-4-akrowiak@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | s390/vfio-ap: dynamic configuration support | expand |
On Tue, 7 Apr 2020 15:20:03 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > Introduces a new driver callback to prevent a root user from unbinding > an AP queue from its device driver if the queue is in use. The intent of > this callback is to provide a driver with the means to prevent a root user > from inadvertently taking a queue away from a guest and giving it to the > host while the guest is still using it. The callback will > be invoked whenever a change to the AP bus's sysfs apmask or aqmask > attributes would result in one or more AP queues being removed from its > driver. If the callback responds in the affirmative for any driver > queried, the change to the apmask or aqmask will be rejected with a device > in use error. > > For this patch, only non-default drivers will be queried. Currently, > there is only one non-default driver, the vfio_ap device driver. The > vfio_ap device driver manages AP queues passed through to one or more > guests and we don't want to unexpectedly take AP resources away from > guests which are most likely independently administered. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- > drivers/s390/crypto/ap_bus.h | 4 + > 2 files changed, 142 insertions(+), 6 deletions(-) > > diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c > index 5256e3ce84e5..af15c095e76a 100644 > --- a/drivers/s390/crypto/ap_bus.c > +++ b/drivers/s390/crypto/ap_bus.c > @@ -35,6 +35,7 @@ > #include <linux/mod_devicetable.h> > #include <linux/debugfs.h> > #include <linux/ctype.h> > +#include <linux/module.h> > > #include "ap_bus.h" > #include "ap_debug.h" > @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str, > newmap = kmalloc(size, GFP_KERNEL); > if (!newmap) > return -ENOMEM; > - if (mutex_lock_interruptible(lock)) { > - kfree(newmap); > - return -ERESTARTSYS; > + if (lock) { > + if (mutex_lock_interruptible(lock)) { > + kfree(newmap); > + return -ERESTARTSYS; > + } This whole function is a bit odd. It seems all masks we want to manipulate are always guarded by the ap_perms_mutex, and the need for allowing lock == NULL comes from wanting to call this function with the ap_perms_mutex already held. That would argue for a locked/unlocked version of this function... but looking at it, why do we lock the way we do? The one thing this function (prior to this patch) does outside of the holding of the mutex is the allocation and freeing of newmap. But with this patch, we do the allocation and freeing of newmap while holding the mutex. Something seems a bit weird here. > } > > if (*str == '+' || *str == '-') {
On Tue, 7 Apr 2020 15:20:03 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > Introduces a new driver callback to prevent a root user from unbinding > an AP queue from its device driver if the queue is in use. The intent of > this callback is to provide a driver with the means to prevent a root user > from inadvertently taking a queue away from a guest and giving it to the > host while the guest is still using it. The callback will > be invoked whenever a change to the AP bus's sysfs apmask or aqmask > attributes would result in one or more AP queues being removed from its > driver. If the callback responds in the affirmative for any driver > queried, the change to the apmask or aqmask will be rejected with a device > in use error. > > For this patch, only non-default drivers will be queried. Currently, > there is only one non-default driver, the vfio_ap device driver. The > vfio_ap device driver manages AP queues passed through to one or more > guests and we don't want to unexpectedly take AP resources away from > guests which are most likely independently administered. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- > drivers/s390/crypto/ap_bus.h | 4 + > 2 files changed, 142 insertions(+), 6 deletions(-) (...) > @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf) > return rc; > } > > +int __verify_card_reservations(struct device_driver *drv, void *data) > +{ > + int rc = 0; > + struct ap_driver *ap_drv = to_ap_drv(drv); > + unsigned long *newapm = (unsigned long *)data; > + > + /* > + * If the reserved bits do not identify cards reserved for use by the > + * non-default driver, there is no need to verify the driver is using > + * the queues. I had to read that one several times... what about "No need to verify whether the driver is using the queues if it is the default driver." ? > + */ > + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT) > + return 0; > + > + /* The non-default driver's module must be loaded */ > + if (!try_module_get(drv->owner)) > + return 0; Is that really needed? I would have thought that the driver core's klist usage would make sure that the callback would not be invoked for drivers that are not registered anymore. Or am I missing a window? > + > + if (ap_drv->in_use) > + if (ap_drv->in_use(newapm, ap_perms.aqm)) Can we log the offending apm somewhere, preferably with additional info that allows the admin to figure out why an error was returned? > + rc = -EADDRINUSE; > + > + module_put(drv->owner); > + > + return rc; > +} (Same comments for the other changes further along in this patch.)
On 14.04.20 14:58, Cornelia Huck wrote: > On Tue, 7 Apr 2020 15:20:03 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> Introduces a new driver callback to prevent a root user from unbinding >> an AP queue from its device driver if the queue is in use. The intent of >> this callback is to provide a driver with the means to prevent a root user >> from inadvertently taking a queue away from a guest and giving it to the >> host while the guest is still using it. The callback will >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >> attributes would result in one or more AP queues being removed from its >> driver. If the callback responds in the affirmative for any driver >> queried, the change to the apmask or aqmask will be rejected with a device >> in use error. >> >> For this patch, only non-default drivers will be queried. Currently, >> there is only one non-default driver, the vfio_ap device driver. The >> vfio_ap device driver manages AP queues passed through to one or more >> guests and we don't want to unexpectedly take AP resources away from >> guests which are most likely independently administered. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- >> drivers/s390/crypto/ap_bus.h | 4 + >> 2 files changed, 142 insertions(+), 6 deletions(-) > (...) > >> @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf) >> return rc; >> } >> >> +int __verify_card_reservations(struct device_driver *drv, void *data) >> +{ >> + int rc = 0; >> + struct ap_driver *ap_drv = to_ap_drv(drv); >> + unsigned long *newapm = (unsigned long *)data; >> + >> + /* >> + * If the reserved bits do not identify cards reserved for use by the >> + * non-default driver, there is no need to verify the driver is using >> + * the queues. > I had to read that one several times... what about > > "No need to verify whether the driver is using the queues if it is the > default driver." > > ? > >> + */ >> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT) >> + return 0; >> + >> + /* The non-default driver's module must be loaded */ >> + if (!try_module_get(drv->owner)) >> + return 0; > Is that really needed? I would have thought that the driver core's > klist usage would make sure that the callback would not be invoked for > drivers that are not registered anymore. Or am I missing a window? The try_module_get() and module_put() is a result of review feedback from my side. The ap bus core is static in the kernel whereas the vfio dd is a kernel module. So there may be a race condition between calling the callback function and removal of the vfio dd module. There is similar code in zcrypt_api which does the same for the zcrypt device drivers before using some variables or functions from the modules. Help me, it this is outdated code and there is no need to adjust the module reference counter any more, then I would be happy to remove this code :-) > >> + >> + if (ap_drv->in_use) >> + if (ap_drv->in_use(newapm, ap_perms.aqm)) > Can we log the offending apm somewhere, preferably with additional info > that allows the admin to figure out why an error was returned? > >> + rc = -EADDRINUSE; >> + >> + module_put(drv->owner); >> + >> + return rc; >> +} > (Same comments for the other changes further along in this patch.) >
On 4/14/20 8:58 AM, Cornelia Huck wrote: > On Tue, 7 Apr 2020 15:20:03 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> Introduces a new driver callback to prevent a root user from unbinding >> an AP queue from its device driver if the queue is in use. The intent of >> this callback is to provide a driver with the means to prevent a root user >> from inadvertently taking a queue away from a guest and giving it to the >> host while the guest is still using it. The callback will >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >> attributes would result in one or more AP queues being removed from its >> driver. If the callback responds in the affirmative for any driver >> queried, the change to the apmask or aqmask will be rejected with a device >> in use error. >> >> For this patch, only non-default drivers will be queried. Currently, >> there is only one non-default driver, the vfio_ap device driver. The >> vfio_ap device driver manages AP queues passed through to one or more >> guests and we don't want to unexpectedly take AP resources away from >> guests which are most likely independently administered. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- >> drivers/s390/crypto/ap_bus.h | 4 + >> 2 files changed, 142 insertions(+), 6 deletions(-) > (...) > >> @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf) >> return rc; >> } >> >> +int __verify_card_reservations(struct device_driver *drv, void *data) >> +{ >> + int rc = 0; >> + struct ap_driver *ap_drv = to_ap_drv(drv); >> + unsigned long *newapm = (unsigned long *)data; >> + >> + /* >> + * If the reserved bits do not identify cards reserved for use by the >> + * non-default driver, there is no need to verify the driver is using >> + * the queues. > I had to read that one several times... what about > "No need to verify whether the driver is using the queues if it is the > default driver." > > ? Sure, that's better. > >> + */ >> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT) >> + return 0; >> + >> + /* The non-default driver's module must be loaded */ >> + if (!try_module_get(drv->owner)) >> + return 0; > Is that really needed? I would have thought that the driver core's > klist usage would make sure that the callback would not be invoked for > drivers that are not registered anymore. Or am I missing a window? > >> + >> + if (ap_drv->in_use) >> + if (ap_drv->in_use(newapm, ap_perms.aqm)) > Can we log the offending apm somewhere, preferably with additional info > that allows the admin to figure out why an error was returned? One of the things on my TODO list is to add logging to the vfio_ap module which will track all significant activity within the device driver. I plan to do that with a patch or set of patches specifically put together for that purpose. Having said that, the best place to log this would be in the in_use callback in the vfio_ap device driver (see next patch) where the APQNs that are in use can be identified. For now, I will log a message to the dmesg log indicating which APQNs are in use by the matrix mdev. > >> + rc = -EADDRINUSE; >> + >> + module_put(drv->owner); >> + >> + return rc; >> +} > (Same comments for the other changes further along in this patch.) >
On 4/14/20 8:08 AM, Cornelia Huck wrote: > On Tue, 7 Apr 2020 15:20:03 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> Introduces a new driver callback to prevent a root user from unbinding >> an AP queue from its device driver if the queue is in use. The intent of >> this callback is to provide a driver with the means to prevent a root user >> from inadvertently taking a queue away from a guest and giving it to the >> host while the guest is still using it. The callback will >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >> attributes would result in one or more AP queues being removed from its >> driver. If the callback responds in the affirmative for any driver >> queried, the change to the apmask or aqmask will be rejected with a device >> in use error. >> >> For this patch, only non-default drivers will be queried. Currently, >> there is only one non-default driver, the vfio_ap device driver. The >> vfio_ap device driver manages AP queues passed through to one or more >> guests and we don't want to unexpectedly take AP resources away from >> guests which are most likely independently administered. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- >> drivers/s390/crypto/ap_bus.h | 4 + >> 2 files changed, 142 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c >> index 5256e3ce84e5..af15c095e76a 100644 >> --- a/drivers/s390/crypto/ap_bus.c >> +++ b/drivers/s390/crypto/ap_bus.c >> @@ -35,6 +35,7 @@ >> #include <linux/mod_devicetable.h> >> #include <linux/debugfs.h> >> #include <linux/ctype.h> >> +#include <linux/module.h> >> >> #include "ap_bus.h" >> #include "ap_debug.h" >> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str, >> newmap = kmalloc(size, GFP_KERNEL); >> if (!newmap) >> return -ENOMEM; >> - if (mutex_lock_interruptible(lock)) { >> - kfree(newmap); >> - return -ERESTARTSYS; >> + if (lock) { >> + if (mutex_lock_interruptible(lock)) { >> + kfree(newmap); >> + return -ERESTARTSYS; >> + } > This whole function is a bit odd. It seems all masks we want to > manipulate are always guarded by the ap_perms_mutex, and the need for > allowing lock == NULL comes from wanting to call this function with the > ap_perms_mutex already held. > > That would argue for a locked/unlocked version of this function... but > looking at it, why do we lock the way we do? The one thing this > function (prior to this patch) does outside of the holding of the mutex > is the allocation and freeing of newmap. But with this patch, we do the > allocation and freeing of newmap while holding the mutex. Something > seems a bit weird here. Note that the ap_parse_mask function copies the newmap to the bitmap passed in as a parameter to the function. Prior to the introduction of this patch, the calling functions - i.e., apmask_store(), aqmask_store() and ap_perms_init() - passed in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm), so the ap_perms were changed directly by this function. With this patch, the apmask_store() and aqmask_store() functions now pass in a copy of those bitmaps. This is so we can verify that any APQNs being removed are not in use by the vfio_ap device driver before committing the change to ap_perms. Consequently, it is now necessary to take the lock for the until the changes are committed. Having explained that, you make a valid argument that this calls for a locked/unlocked version of this function, so I will modify this patch to that effect. > >> } >> >> if (*str == '+' || *str == '-') {
On Wed, 15 Apr 2020 08:08:24 +0200 Harald Freudenberger <freude@linux.ibm.com> wrote: > On 14.04.20 14:58, Cornelia Huck wrote: > > On Tue, 7 Apr 2020 15:20:03 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> + /* The non-default driver's module must be loaded */ > >> + if (!try_module_get(drv->owner)) > >> + return 0; > > Is that really needed? I would have thought that the driver core's > > klist usage would make sure that the callback would not be invoked for > > drivers that are not registered anymore. Or am I missing a window? > The try_module_get() and module_put() is a result of review feedback from > my side. The ap bus core is static in the kernel whereas the > vfio dd is a kernel module. So there may be a race condition between > calling the callback function and removal of the vfio dd module. > There is similar code in zcrypt_api which does the same for the zcrypt > device drivers before using some variables or functions from the modules. > Help me, it this is outdated code and there is no need to adjust the > module reference counter any more, then I would be happy to remove > this code :-) I think the driver core already should keep us safe. A built-in bus calling a driver in a module is a very common pattern, and I think ->owner was introduced exactly for that case. Unless I'm really missing something obvious?
On Wed, 15 Apr 2020 13:10:10 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > On 4/14/20 8:58 AM, Cornelia Huck wrote: > > On Tue, 7 Apr 2020 15:20:03 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> + > >> + if (ap_drv->in_use) > >> + if (ap_drv->in_use(newapm, ap_perms.aqm)) > > Can we log the offending apm somewhere, preferably with additional info > > that allows the admin to figure out why an error was returned? > > One of the things on my TODO list is to add logging to the vfio_ap > module which will track all significant activity within the device > driver. I plan to do that with a patch or set of patches specifically > put together for that purpose. Having said that, the best place to > log this would be in the in_use callback in the vfio_ap device driver > (see next patch) where the APQNs that are in use can be identified. > For now, I will log a message to the dmesg log indicating which > APQNs are in use by the matrix mdev. Sounds reasonable. My main issue was what an admin was supposed to do until logging was in place :) > > > > >> + rc = -EADDRINUSE; > >> + > >> + module_put(drv->owner); > >> + > >> + return rc; > >> +}
On Wed, 15 Apr 2020 13:10:18 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > On 4/14/20 8:08 AM, Cornelia Huck wrote: > > On Tue, 7 Apr 2020 15:20:03 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str, > >> newmap = kmalloc(size, GFP_KERNEL); > >> if (!newmap) > >> return -ENOMEM; > >> - if (mutex_lock_interruptible(lock)) { > >> - kfree(newmap); > >> - return -ERESTARTSYS; > >> + if (lock) { > >> + if (mutex_lock_interruptible(lock)) { > >> + kfree(newmap); > >> + return -ERESTARTSYS; > >> + } > > This whole function is a bit odd. It seems all masks we want to > > manipulate are always guarded by the ap_perms_mutex, and the need for > > allowing lock == NULL comes from wanting to call this function with the > > ap_perms_mutex already held. > > > > That would argue for a locked/unlocked version of this function... but > > looking at it, why do we lock the way we do? The one thing this > > function (prior to this patch) does outside of the holding of the mutex > > is the allocation and freeing of newmap. But with this patch, we do the > > allocation and freeing of newmap while holding the mutex. Something > > seems a bit weird here. > > Note that the ap_parse_mask function copies the newmap > to the bitmap passed in as a parameter to the function. > Prior to the introduction of this patch, the calling functions - i.e., > apmask_store(), aqmask_store() and ap_perms_init() - passed > in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm), > so the ap_perms were changed directly by this function. > > With this patch, the apmask_store() and aqmask_store() > functions now pass in a copy of those bitmaps. This is so > we can verify that any APQNs being removed are not > in use by the vfio_ap device driver before committing the > change to ap_perms. Consequently, it is now necessary > to take the lock for the until the changes are committed. Yes, but every caller actually takes the mutex before calling this function already :) > Having explained that, you make a valid argument that > this calls for a locked/unlocked version of this function, so > I will modify this patch to that effect. Ok. The other thing I found weird is that the function does alloc newmap -> grab mutex -> do manipulation -> release mutex -> free newmap while the new callers do (mutex already held) -> alloc newmap so why grab/release the mutex the way the function does now? IOW, why not have an unlocked __ap_parse_mask_string() and do int ap_parse_mask_string(...) { int rc; if (mutex_lock_interruptible(&ap_perms_mutex)) return -ERESTARTSYS; rc = __ap_parse_mask_string(...); mutex_unlock(&ap_perms_mutex); return rc; }
On 4/16/20 6:05 AM, Cornelia Huck wrote: > On Wed, 15 Apr 2020 13:10:18 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> On 4/14/20 8:08 AM, Cornelia Huck wrote: >>> On Tue, 7 Apr 2020 15:20:03 -0400 >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>>> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str, >>>> newmap = kmalloc(size, GFP_KERNEL); >>>> if (!newmap) >>>> return -ENOMEM; >>>> - if (mutex_lock_interruptible(lock)) { >>>> - kfree(newmap); >>>> - return -ERESTARTSYS; >>>> + if (lock) { >>>> + if (mutex_lock_interruptible(lock)) { >>>> + kfree(newmap); >>>> + return -ERESTARTSYS; >>>> + } >>> This whole function is a bit odd. It seems all masks we want to >>> manipulate are always guarded by the ap_perms_mutex, and the need for >>> allowing lock == NULL comes from wanting to call this function with the >>> ap_perms_mutex already held. >>> >>> That would argue for a locked/unlocked version of this function... but >>> looking at it, why do we lock the way we do? The one thing this >>> function (prior to this patch) does outside of the holding of the mutex >>> is the allocation and freeing of newmap. But with this patch, we do the >>> allocation and freeing of newmap while holding the mutex. Something >>> seems a bit weird here. >> Note that the ap_parse_mask function copies the newmap >> to the bitmap passed in as a parameter to the function. >> Prior to the introduction of this patch, the calling functions - i.e., >> apmask_store(), aqmask_store() and ap_perms_init() - passed >> in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm), >> so the ap_perms were changed directly by this function. >> >> With this patch, the apmask_store() and aqmask_store() >> functions now pass in a copy of those bitmaps. This is so >> we can verify that any APQNs being removed are not >> in use by the vfio_ap device driver before committing the >> change to ap_perms. Consequently, it is now necessary >> to take the lock for the until the changes are committed. > Yes, but every caller actually takes the mutex before calling this > function already :) That is not a true statement, the ap_perms_init() function does not take the mutex prior to calling this function. Keep in mind, the ap_parse_mask function is not static and is exported, I was precluded from removing the lock parameter from the function definition. > >> Having explained that, you make a valid argument that >> this calls for a locked/unlocked version of this function, so >> I will modify this patch to that effect. > Ok. > > The other thing I found weird is that the function does > alloc newmap -> grab mutex -> do manipulation -> release mutex -> free newmap > while the new callers do > (mutex already held) -> alloc newmap > > so why grab/release the mutex the way the function does now? IOW, why > not have an unlocked __ap_parse_mask_string() and do In my last comment above, I agreed to create an unlocked version of this function. Your example below is similar to what I implemented after responding to your comment yesterday. > > int ap_parse_mask_string(...) > { > int rc; > > if (mutex_lock_interruptible(&ap_perms_mutex)) > return -ERESTARTSYS; > rc = __ap_parse_mask_string(...); > mutex_unlock(&ap_perms_mutex); > return rc; > }
On 16.04.20 11:33, Cornelia Huck wrote: > On Wed, 15 Apr 2020 08:08:24 +0200 > Harald Freudenberger <freude@linux.ibm.com> wrote: > >> On 14.04.20 14:58, Cornelia Huck wrote: >>> On Tue, 7 Apr 2020 15:20:03 -0400 >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>>> + /* The non-default driver's module must be loaded */ >>>> + if (!try_module_get(drv->owner)) >>>> + return 0; >>> Is that really needed? I would have thought that the driver core's >>> klist usage would make sure that the callback would not be invoked for >>> drivers that are not registered anymore. Or am I missing a window? >> The try_module_get() and module_put() is a result of review feedback from >> my side. The ap bus core is static in the kernel whereas the >> vfio dd is a kernel module. So there may be a race condition between >> calling the callback function and removal of the vfio dd module. >> There is similar code in zcrypt_api which does the same for the zcrypt >> device drivers before using some variables or functions from the modules. >> Help me, it this is outdated code and there is no need to adjust the >> module reference counter any more, then I would be happy to remove >> this code :-) > I think the driver core already should keep us safe. A built-in bus > calling a driver in a module is a very common pattern, and I think > ->owner was introduced exactly for that case. > > Unless I'm really missing something obvious? Hm. I tested a similar code (see zcrypt_api.c where try_module_get() and module_put() is called surrounding use of functions related to the implementing driver. The driver module has a reference count of 0 when not used and can get removed - because refcount is 0 - at any time when there is nothing related to the driver pending. As soon as the driver is actually used the try_module_get(...driver.owner) increases the reference counter and makes it impossible to remove the module. After use the module_put() reduces the reference count. When I now remove the try_module_get() and module_put() calls and run this modified code I immediately face a crash when the module is removed during use. I see code in the kernel which does an initial try_module_get() on the driver to increase the reference count, for example when the driver registers. However, I see no clear way to remove such a driver module any more. I know I had a fight with a tester some years ago where he stated that it is a valid testcase to remove a device driver module 'during use of the driver'. So I'd like to have the try_module_get() and module_put() invokations in the ap bus code until you convince me there are other maybe better ways to make sure the driver and it's functions are available at the time of the call. Maybe we can discuss this offline if you wish :-)
On Thu, 16 Apr 2020 11:37:21 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Wed, 15 Apr 2020 13:10:10 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > On 4/14/20 8:58 AM, Cornelia Huck wrote: > > > On Tue, 7 Apr 2020 15:20:03 -0400 > > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > >> + > > >> + if (ap_drv->in_use) > > >> + if (ap_drv->in_use(newapm, ap_perms.aqm)) > > > Can we log the offending apm somewhere, preferably with additional info > > > that allows the admin to figure out why an error was returned? > > > > One of the things on my TODO list is to add logging to the vfio_ap > > module which will track all significant activity within the device > > driver. I plan to do that with a patch or set of patches specifically > > put together for that purpose. Having said that, the best place to > > log this would be in the in_use callback in the vfio_ap device driver > > (see next patch) where the APQNs that are in use can be identified. > > For now, I will log a message to the dmesg log indicating which > > APQNs are in use by the matrix mdev. > > Sounds reasonable. My main issue was what an admin was supposed to do > until logging was in place :) Logging may not be the right answer here. Imagine somebody wants to build a nice web-tool for managing this stuff at scale -- e.g. something HMC. I don't think the solution is to let this tool parse the kernel messages and try to relate that to its own transactions. But I do agree, having a way to report why "won't do" to the end user is important for usability. Regards, Halil > > > > > > > > >> + rc = -EADDRINUSE; > > >> + > > >> + module_put(drv->owner); > > >> + > > >> + return rc; > > >> +} >
On 4/23/20 11:33 PM, Halil Pasic wrote: > On Thu, 16 Apr 2020 11:37:21 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Wed, 15 Apr 2020 13:10:10 -0400 >> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >> >>> On 4/14/20 8:58 AM, Cornelia Huck wrote: >>>> On Tue, 7 Apr 2020 15:20:03 -0400 >>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>>>> + >>>>> + if (ap_drv->in_use) >>>>> + if (ap_drv->in_use(newapm, ap_perms.aqm)) >>>> Can we log the offending apm somewhere, preferably with additional info >>>> that allows the admin to figure out why an error was returned? >>> One of the things on my TODO list is to add logging to the vfio_ap >>> module which will track all significant activity within the device >>> driver. I plan to do that with a patch or set of patches specifically >>> put together for that purpose. Having said that, the best place to >>> log this would be in the in_use callback in the vfio_ap device driver >>> (see next patch) where the APQNs that are in use can be identified. >>> For now, I will log a message to the dmesg log indicating which >>> APQNs are in use by the matrix mdev. >> Sounds reasonable. My main issue was what an admin was supposed to do >> until logging was in place :) > Logging may not be the right answer here. Imagine somebody wants to build > a nice web-tool for managing this stuff at scale -- e.g. something HMC. I > don't think the solution is to let this tool parse the kernel messages > and try to relate that to its own transactions. I don't believe there is no right or wrong answer here; I simply don't see the relevance of discussing a tool in this context. We are talking about a sysfs attribute interface here, so - correct me if I'm mistaken - our options for notifying the user that a queue is in use are limited to the return code from the sysfs interface and logging. I would expect that a tool would have to do something similar to the callback implemented in the vfio_ap device driver and check the APQNs removed against the APQNs assigned to the mdevs to determine which is in use. > > But I do agree, having a way to report why "won't do" to the end user is > important for usability. > > Regards, > Halil > >>>> >>>>> + rc = -EADDRINUSE; >>>>> + >>>>> + module_put(drv->owner); >>>>> + >>>>> + return rc; >>>>> +}
On Fri, 24 Apr 2020 13:07:38 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > On 4/23/20 11:33 PM, Halil Pasic wrote: > > On Thu, 16 Apr 2020 11:37:21 +0200 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > >> On Wed, 15 Apr 2020 13:10:10 -0400 > >> Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> > >>> On 4/14/20 8:58 AM, Cornelia Huck wrote: > >>>> On Tue, 7 Apr 2020 15:20:03 -0400 > >>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >>>>> + > >>>>> + if (ap_drv->in_use) > >>>>> + if (ap_drv->in_use(newapm, ap_perms.aqm)) > >>>> Can we log the offending apm somewhere, preferably with additional info > >>>> that allows the admin to figure out why an error was returned? > >>> One of the things on my TODO list is to add logging to the vfio_ap > >>> module which will track all significant activity within the device > >>> driver. I plan to do that with a patch or set of patches specifically > >>> put together for that purpose. Having said that, the best place to > >>> log this would be in the in_use callback in the vfio_ap device driver > >>> (see next patch) where the APQNs that are in use can be identified. > >>> For now, I will log a message to the dmesg log indicating which > >>> APQNs are in use by the matrix mdev. > >> Sounds reasonable. My main issue was what an admin was supposed to do > >> until logging was in place :) > > Logging may not be the right answer here. Imagine somebody wants to build > > a nice web-tool for managing this stuff at scale -- e.g. something HMC. I > > don't think the solution is to let this tool parse the kernel messages > > and try to relate that to its own transactions. > > I don't believe there is no right or wrong answer here; I simply don't > see the relevance of discussing a tool in this context. We are talking > about a sysfs attribute interface here, so - correct me if I'm > mistaken - our options for notifying the user that a queue is in use are > limited to the return code from the sysfs interface and logging. I would > expect that a tool would have to do something similar to the callback > implemented in the vfio_ap device driver and check the APQNs > removed against the APQNs assigned to the mdevs to determine which > is in use. > We are talking interface design. The relevance of discussing a tool is that any userspace tool must come by with whatever interface we come up now. IMHO thinking about the usage (and the client code) is very helpful in avoiding broken interface designs. AFAIK this is one of the basic ideas behind test driven development. Regards, Halil
On 2020-04-07 21:20, Tony Krowiak wrote: > Introduces a new driver callback to prevent a root user from unbinding > an AP queue from its device driver if the queue is in use. The intent of > this callback is to provide a driver with the means to prevent a root user > from inadvertently taking a queue away from a guest and giving it to the > host while the guest is still using it. How can we know, at this point if the guest uses or not the queue? Do you want to say that this prevents to take away a queue when it is currently assigned to a VFIO device? and with a guest currently using this VFIO device? > The callback will > be invoked whenever a change to the AP bus's sysfs apmask or aqmask > attributes would result in one or more AP queues being removed from its > driver. If the callback responds in the affirmative for any driver > queried, the change to the apmask or aqmask will be rejected with a device > in use error. AFAIU you mean that Linux's driver's binding and unbinding mechanism is not sufficient to avoid this issue because unbind can not be refused by the driver. The reason why we do not want a single queue to be removed from the VFIO driver is because the VFIO drivers works on a matrix, not on queues, and for the matrix to be consistent it needs to acquire all queues defined by the cross product of all APID and AQID assigned to the matrix. This functionality is valid for the host as for the guests and is handled automatically by the firmware with the CRYCB. The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP queues. If instead to mix VFIO CRYCB matrix handling and queues at the same level inside the AP bus we separate these different firmware entities in two different software entities. If we make the AP bus sit above a CRYCB/Matrix bus, and in the way virtualize the QCI and test AP queue instructions: - we can directly pass a matrix device to the guest though a VFIO matrix device - the consistence will be automatic - the VFIO device and parent device will be of the same kind which would make the design much more clearer. - there will be no need for these callback because the consistence of the matrix will be guaranteed by firmware > > For this patch, only non-default drivers will be queried. Currently, > there is only one non-default driver, the vfio_ap device driver. You mean that the admin may take queues away from the "default driver", while the queue is in use, to give it to an other driver? Why is it to avoid in one way and not in the other way? > The > vfio_ap device driver manages AP queues passed through to one or more > guests I read this as if a queue may be passed to several guest... please, rephrase or explain. > and we don't want to unexpectedly take AP resources away from > guests which are most likely independently administered. When you say "independently administered", you mean as a second admin inside the host, don't you? Regards, Pierre
On 4/24/20 2:23 PM, Halil Pasic wrote: > On Fri, 24 Apr 2020 13:07:38 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> >> On 4/23/20 11:33 PM, Halil Pasic wrote: >>> On Thu, 16 Apr 2020 11:37:21 +0200 >>> Cornelia Huck <cohuck@redhat.com> wrote: >>> >>>> On Wed, 15 Apr 2020 13:10:10 -0400 >>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>>> >>>>> On 4/14/20 8:58 AM, Cornelia Huck wrote: >>>>>> On Tue, 7 Apr 2020 15:20:03 -0400 >>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>>>>>> + >>>>>>> + if (ap_drv->in_use) >>>>>>> + if (ap_drv->in_use(newapm, ap_perms.aqm)) >>>>>> Can we log the offending apm somewhere, preferably with additional info >>>>>> that allows the admin to figure out why an error was returned? >>>>> One of the things on my TODO list is to add logging to the vfio_ap >>>>> module which will track all significant activity within the device >>>>> driver. I plan to do that with a patch or set of patches specifically >>>>> put together for that purpose. Having said that, the best place to >>>>> log this would be in the in_use callback in the vfio_ap device driver >>>>> (see next patch) where the APQNs that are in use can be identified. >>>>> For now, I will log a message to the dmesg log indicating which >>>>> APQNs are in use by the matrix mdev. >>>> Sounds reasonable. My main issue was what an admin was supposed to do >>>> until logging was in place :) >>> Logging may not be the right answer here. Imagine somebody wants to build >>> a nice web-tool for managing this stuff at scale -- e.g. something HMC. I >>> don't think the solution is to let this tool parse the kernel messages >>> and try to relate that to its own transactions. >> I don't believe there is no right or wrong answer here; I simply don't >> see the relevance of discussing a tool in this context. We are talking >> about a sysfs attribute interface here, so - correct me if I'm >> mistaken - our options for notifying the user that a queue is in use are >> limited to the return code from the sysfs interface and logging. I would >> expect that a tool would have to do something similar to the callback >> implemented in the vfio_ap device driver and check the APQNs >> removed against the APQNs assigned to the mdevs to determine which >> is in use. >> > We are talking interface design. The relevance of discussing a tool is > that any userspace tool must come by with whatever interface we come up > now. IMHO thinking about the usage (and the client code) is very helpful > in avoiding broken interface designs. AFAIK this is one of the basic > ideas behind test driven development. What can a sysfs interface, such as apmask/aqmask, do other than reply with a return code given sysfs attributes that store data can only return a count? I'm sorry, but I still don't see the relevance. > > Regards, > Halil
On 4/27/20 4:20 AM, Pierre Morel wrote: > > > On 2020-04-07 21:20, Tony Krowiak wrote: >> Introduces a new driver callback to prevent a root user from unbinding >> an AP queue from its device driver if the queue is in use. The intent of >> this callback is to provide a driver with the means to prevent a root >> user >> from inadvertently taking a queue away from a guest and giving it to the >> host while the guest is still using it. > > How can we know, at this point if the guest uses or not the queue? The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a pointer to KVM when the matrix mdev is in use by a guest. This patch series also introduces a shadow_crycb (soon to be shadow_apcb) which holds the AP configuration for the guest. Between those two things, the driver can detect when a queue is in use by a guest. > Do you want to say that this prevents to take away a queue when it is > currently assigned to a VFIO device? > and with a guest currently using this VFIO device? No, I do not. The intent here is to enforce the proper procedure for giving up a queue so it is done deliberately. Before taking a queue away from the matrix mdev, its APQN should be unassigned from the matrix mdev. That is not to say that if there are major objections to this that we can't base in_use upon the queue being in use by a guest at the time. Maybe that is preferable to the community. I'll leave it to them to state their case. > >> The callback will >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >> attributes would result in one or more AP queues being removed from its >> driver. If the callback responds in the affirmative for any driver >> queried, the change to the apmask or aqmask will be rejected with a >> device >> in use error. > > AFAIU you mean that Linux's driver's binding and unbinding mechanism > is not sufficient to avoid this issue because unbind can not be > refused by the driver. Correct! > > > The reason why we do not want a single queue to be removed from the > VFIO driver is because the VFIO drivers works on a matrix, not on > queues, and for the matrix to be consistent it needs to acquire all > queues defined by the cross product of all APID and AQID assigned to > the matrix. Not correct. The reason why is because we do not want a queue to be surreptitiously removed without the guest administrator being aware of its removal. > > This functionality is valid for the host as for the guests and is > handled automatically by the firmware with the CRYCB. > The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP > queues. > > If instead to mix VFIO CRYCB matrix handling and queues at the same > level inside the AP bus we separate these different firmware entities > in two different software entities. > > If we make the AP bus sit above a CRYCB/Matrix bus, and in the way > virtualize the QCI and test AP queue instructions: > - we can directly pass a matrix device to the guest though a VFIO > matrix device > - the consistence will be automatic > - the VFIO device and parent device will be of the same kind which > would make the design much more clearer. > - there will be no need for these callback because the consistence of > the matrix will be guaranteed by firmware As stated in my response above, the issue here is not consistency. While the design you describe may be reasonable, it is a major departure from what is out in the field. In other words, that ship has sailed. > > >> >> For this patch, only non-default drivers will be queried. Currently, >> there is only one non-default driver, the vfio_ap device driver. > > You mean that the admin may take queues away from the "default > driver", while the queue is in use, to give it to an other driver? > Why is it to avoid in one way and not in the other way? Because the default drivers have direct control over the queues and can ensure they are empty and reset before giving up control. The vfio driver does not have direct control over the queues because they have been passed through to the guest. > >> The >> vfio_ap device driver manages AP queues passed through to one or more >> guests > > I read this as if a queue may be passed to several guest... > please, rephrase or explain. AP queues is plural, so it is true that AP queues can be passed through to more than one guest. I see your point, however, so I'll reword that to be more clear. > >> and we don't want to unexpectedly take AP resources away from >> guests which are most likely independently administered. > > When you say "independently administered", you mean as a second admin > inside the host, don't you? I mean that a guest can be administered by a different person than the host administrator. Again, I'll try to clarify this. > > > Regards, > Pierre >
On 2020-04-28 00:24, Tony Krowiak wrote: > > > On 4/27/20 4:20 AM, Pierre Morel wrote: >> >> >> On 2020-04-07 21:20, Tony Krowiak wrote: >>> Introduces a new driver callback to prevent a root user from unbinding >>> an AP queue from its device driver if the queue is in use. The intent of >>> this callback is to provide a driver with the means to prevent a root >>> user >>> from inadvertently taking a queue away from a guest and giving it to the >>> host while the guest is still using it. ...snip... >> >> This functionality is valid for the host as for the guests and is >> handled automatically by the firmware with the CRYCB. >> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP >> queues. >> >> If instead to mix VFIO CRYCB matrix handling and queues at the same >> level inside the AP bus we separate these different firmware entities >> in two different software entities. >> >> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way >> virtualize the QCI and test AP queue instructions: >> - we can directly pass a matrix device to the guest though a VFIO >> matrix device >> - the consistence will be automatic >> - the VFIO device and parent device will be of the same kind which >> would make the design much more clearer. >> - there will be no need for these callback because the consistence of >> the matrix will be guaranteed by firmware > > As stated in my response above, the issue here is not consistency. While > the design you describe > may be reasonable, it is a major departure from what is out in the > field. In other words, that ship > has sailed. The current VFIO-AP driver works as before, without any change, above the Matrix device I suggest. Aside the old scheme which can continue, the Matrix device can be used directly to build a VFIO Matrix device, usable by QEMU without any modification. Once the dynamic extensions proposed in this series and the associated tools are out on the field, then yes the ship is really far. For now, the existing user's API do not change, the existing tools do not need modifications and we can repair the ship for its long journey. The inconsistency between device and VFIO device and the resulting complexity is not going to ease future enhancement. Regards, Pierre
On 28.04.20 00:24, Tony Krowiak wrote: > > > On 4/27/20 4:20 AM, Pierre Morel wrote: >> >> >> On 2020-04-07 21:20, Tony Krowiak wrote: >>> Introduces a new driver callback to prevent a root user from unbinding >>> an AP queue from its device driver if the queue is in use. The intent of >>> this callback is to provide a driver with the means to prevent a root user >>> from inadvertently taking a queue away from a guest and giving it to the >>> host while the guest is still using it. >> >> How can we know, at this point if the guest uses or not the queue? > > The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a pointer to KVM when > the matrix mdev is in use by a guest. This patch series also introduces a shadow_crycb (soon to > be shadow_apcb) which holds the AP configuration for the guest. Between those two things, > the driver can detect when a queue is in use by a guest. > >> Do you want to say that this prevents to take away a queue when it is currently assigned to a VFIO device? >> and with a guest currently using this VFIO device? > > No, I do not. The intent here is to enforce the proper procedure for giving up a queue so it is done > deliberately. Before taking a queue away from the matrix mdev, its APQN should be unassigned > from the matrix mdev. That is not to say that if there are major objections to this that we can't > base in_use upon the queue being in use by a guest at the time. Maybe that is preferable to > the community. I'll leave it to them to state their case. > >> >>> The callback will >>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >>> attributes would result in one or more AP queues being removed from its >>> driver. If the callback responds in the affirmative for any driver >>> queried, the change to the apmask or aqmask will be rejected with a device >>> in use error. >> >> AFAIU you mean that Linux's driver's binding and unbinding mechanism is not sufficient to avoid this issue because unbind can not be refused by the driver. > > Correct! > >> >> >> The reason why we do not want a single queue to be removed from the VFIO driver is because the VFIO drivers works on a matrix, not on queues, and for the matrix to be consistent it needs to acquire all queues defined by the cross product of all APID and AQID assigned to the matrix. > > Not correct. The reason why is because we do not want a queue to be surreptitiously removed > without the guest administrator being aware of its removal. > >> >> This functionality is valid for the host as for the guests and is handled automatically by the firmware with the CRYCB. >> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP queues. >> >> If instead to mix VFIO CRYCB matrix handling and queues at the same level inside the AP bus we separate these different firmware entities in two different software entities. >> >> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way virtualize the QCI and test AP queue instructions: >> - we can directly pass a matrix device to the guest though a VFIO matrix device >> - the consistence will be automatic >> - the VFIO device and parent device will be of the same kind which would make the design much more clearer. >> - there will be no need for these callback because the consistence of the matrix will be guaranteed by firmware > > As stated in my response above, the issue here is not consistency. While the design you describe > may be reasonable, it is a major departure from what is out in the field. In other words, that ship > has sailed. > >> >> >>> >>> For this patch, only non-default drivers will be queried. Currently, >>> there is only one non-default driver, the vfio_ap device driver. >> >> You mean that the admin may take queues away from the "default driver", while the queue is in use, to give it to an other driver? >> Why is it to avoid in one way and not in the other way? > > Because the default drivers have direct control over the queues and can ensure they are empty > and reset before giving up control. The vfio driver does not have direct control over the queues > because they have been passed through to the guest. No, that's not true. The 'default' drivers have no change to do anything with an APQN when it is removed from the driver. They get the very same notification which is the remove() callback as the vfio dd gets and have the very same change to do something here. The more interesting thing here is, that the remove() callback invocation is usually because a hardware HAS BEEN GONE AWAY. Neither the 'default' drivers nor the vfio dd can do a reset on a not-any-more existing APQN. And it is also not true that the vfio dd has no direct control over the queue because they have been passed through to the guest. It's the job of the vfio dd to modify the guest's APM, AQM, ADM masks to disable the guest's access to the APQN and then the vfio can (try to) do a reset. > >> >>> The >>> vfio_ap device driver manages AP queues passed through to one or more >>> guests >> >> I read this as if a queue may be passed to several guest... >> please, rephrase or explain. > > AP queues is plural, so it is true that AP queues can be passed through > to more than one guest. I see your point, however, so I'll reword that > to be more clear. > >> >>> and we don't want to unexpectedly take AP resources away from >>> guests which are most likely independently administered. >> >> When you say "independently administered", you mean as a second admin inside the host, don't you? > > I mean that a guest can be administered by a different person than the host administrator. > Again, I'll try to clarify this. > >> >> >> Regards, >> Pierre >> >
On 4/28/20 7:07 AM, Harald Freudenberger wrote: > On 28.04.20 00:24, Tony Krowiak wrote: >> >> On 4/27/20 4:20 AM, Pierre Morel wrote: >>> >>> On 2020-04-07 21:20, Tony Krowiak wrote: >>>> Introduces a new driver callback to prevent a root user from unbinding >>>> an AP queue from its device driver if the queue is in use. The intent of >>>> this callback is to provide a driver with the means to prevent a root user >>>> from inadvertently taking a queue away from a guest and giving it to the >>>> host while the guest is still using it. >>> How can we know, at this point if the guest uses or not the queue? >> The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a pointer to KVM when >> the matrix mdev is in use by a guest. This patch series also introduces a shadow_crycb (soon to >> be shadow_apcb) which holds the AP configuration for the guest. Between those two things, >> the driver can detect when a queue is in use by a guest. >> >>> Do you want to say that this prevents to take away a queue when it is currently assigned to a VFIO device? >>> and with a guest currently using this VFIO device? >> No, I do not. The intent here is to enforce the proper procedure for giving up a queue so it is done >> deliberately. Before taking a queue away from the matrix mdev, its APQN should be unassigned >> from the matrix mdev. That is not to say that if there are major objections to this that we can't >> base in_use upon the queue being in use by a guest at the time. Maybe that is preferable to >> the community. I'll leave it to them to state their case. >> >>>> The callback will >>>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask >>>> attributes would result in one or more AP queues being removed from its >>>> driver. If the callback responds in the affirmative for any driver >>>> queried, the change to the apmask or aqmask will be rejected with a device >>>> in use error. >>> AFAIU you mean that Linux's driver's binding and unbinding mechanism is not sufficient to avoid this issue because unbind can not be refused by the driver. >> Correct! >> >>> >>> The reason why we do not want a single queue to be removed from the VFIO driver is because the VFIO drivers works on a matrix, not on queues, and for the matrix to be consistent it needs to acquire all queues defined by the cross product of all APID and AQID assigned to the matrix. >> Not correct. The reason why is because we do not want a queue to be surreptitiously removed >> without the guest administrator being aware of its removal. >> >>> This functionality is valid for the host as for the guests and is handled automatically by the firmware with the CRYCB. >>> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP queues. >>> >>> If instead to mix VFIO CRYCB matrix handling and queues at the same level inside the AP bus we separate these different firmware entities in two different software entities. >>> >>> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way virtualize the QCI and test AP queue instructions: >>> - we can directly pass a matrix device to the guest though a VFIO matrix device >>> - the consistence will be automatic >>> - the VFIO device and parent device will be of the same kind which would make the design much more clearer. >>> - there will be no need for these callback because the consistence of the matrix will be guaranteed by firmware >> As stated in my response above, the issue here is not consistency. While the design you describe >> may be reasonable, it is a major departure from what is out in the field. In other words, that ship >> has sailed. >> >>> >>>> For this patch, only non-default drivers will be queried. Currently, >>>> there is only one non-default driver, the vfio_ap device driver. >>> You mean that the admin may take queues away from the "default driver", while the queue is in use, to give it to an other driver? >>> Why is it to avoid in one way and not in the other way? >> Because the default drivers have direct control over the queues and can ensure they are empty >> and reset before giving up control. The vfio driver does not have direct control over the queues >> because they have been passed through to the guest. > No, that's not true. The 'default' drivers have no change to do anything with an APQN when it is removed > from the driver. They get the very same notification which is the remove() callback as the vfio dd gets > and have the very same change to do something here. The more interesting thing here is, that the remove() > callback invocation is usually because a hardware HAS BEEN GONE AWAY. Neither the 'default' drivers > nor the vfio dd can do a reset on a not-any-more existing APQN. > And it is also not true that the vfio dd has no direct control over the queue because they have been passed > through to the guest. It's the job of the vfio dd to modify the guest's APM, AQM, ADM masks to disable > the guest's access to the APQN and then the vfio can (try to) do a reset. The context here is when a sysadmin deliberately takes one or more queues away from a guest by changing the apmask or aqmask; we are not talking about the the case where an adapter is deconfigured or disappears. The idea here is to prevent a sysadmin for the host from taking a queue away from a KVM guest that is using it. IMHO, control over that queue should belong to the guest until such time as the guest gives it up or the guest is terminated. Since the zcrypt drivers are directly responsible for their AP queues, it is not necessary to implement this callback, although there is nothing precluding that. >>>> The >>>> vfio_ap device driver manages AP queues passed through to one or more >>>> guests >>> I read this as if a queue may be passed to several guest... >>> please, rephrase or explain. >> AP queues is plural, so it is true that AP queues can be passed through >> to more than one guest. I see your point, however, so I'll reword that >> to be more clear. >> >>>> and we don't want to unexpectedly take AP resources away from >>>> guests which are most likely independently administered. >>> When you say "independently administered", you mean as a second admin inside the host, don't you? >> I mean that a guest can be administered by a different person than the host administrator. >> Again, I'll try to clarify this. >> >>> >>> Regards, >>> Pierre >>>
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 5256e3ce84e5..af15c095e76a 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -35,6 +35,7 @@ #include <linux/mod_devicetable.h> #include <linux/debugfs.h> #include <linux/ctype.h> +#include <linux/module.h> #include "ap_bus.h" #include "ap_debug.h" @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str, newmap = kmalloc(size, GFP_KERNEL); if (!newmap) return -ENOMEM; - if (mutex_lock_interruptible(lock)) { - kfree(newmap); - return -ERESTARTSYS; + if (lock) { + if (mutex_lock_interruptible(lock)) { + kfree(newmap); + return -ERESTARTSYS; + } } if (*str == '+' || *str == '-') { @@ -1009,7 +1012,10 @@ int ap_parse_mask_str(const char *str, } if (rc == 0) memcpy(bitmap, newmap, size); - mutex_unlock(lock); + + if (lock) + mutex_unlock(lock); + kfree(newmap); return rc; } @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf) return rc; } +int __verify_card_reservations(struct device_driver *drv, void *data) +{ + int rc = 0; + struct ap_driver *ap_drv = to_ap_drv(drv); + unsigned long *newapm = (unsigned long *)data; + + /* + * If the reserved bits do not identify cards reserved for use by the + * non-default driver, there is no need to verify the driver is using + * the queues. + */ + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT) + return 0; + + /* The non-default driver's module must be loaded */ + if (!try_module_get(drv->owner)) + return 0; + + if (ap_drv->in_use) + if (ap_drv->in_use(newapm, ap_perms.aqm)) + rc = -EADDRINUSE; + + module_put(drv->owner); + + return rc; +} + +static int apmask_commit(unsigned long *newapm) +{ + int rc; + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)]; + + /* + * Check if any bits in the apmask have been set which will + * result in queues being removed from non-default drivers + */ + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) { + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved, + __verify_card_reservations); + if (rc) + return rc; + } + + memcpy(ap_perms.apm, newapm, APMASKSIZE); + + return 0; +} + static ssize_t apmask_store(struct bus_type *bus, const char *buf, size_t count) { int rc; + unsigned long newapm[BITS_TO_LONGS(AP_DEVICES)]; + + if (mutex_lock_interruptible(&ap_perms_mutex)) + return -ERESTARTSYS; + + memcpy(newapm, ap_perms.apm, APMASKSIZE); + + rc = ap_parse_mask_str(buf, newapm, AP_DEVICES, NULL); + if (rc) + goto done; + + rc = apmask_commit(newapm); + if (rc) + goto done; - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex); +done: + mutex_unlock(&ap_perms_mutex); if (rc) return rc; @@ -1227,12 +1296,75 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf) return rc; } +int __verify_queue_reservations(struct device_driver *drv, void *data) +{ + int rc = 0; + struct ap_driver *ap_drv = to_ap_drv(drv); + unsigned long *newaqm = (unsigned long *)data; + + /* + * If the reserved bits do not identify queues reserved for use by the + * non-default driver, there is no need to verify the driver is using + * the queues. + */ + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT) + return 0; + + /* The non-default driver's module must be loaded */ + if (!try_module_get(drv->owner)) + return 0; + + if (ap_drv->in_use) + if (ap_drv->in_use(ap_perms.apm, newaqm)) + rc = -EADDRINUSE; + + module_put(drv->owner); + + return rc; +} + +static int aqmask_commit(unsigned long *newaqm) +{ + int rc; + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)]; + + /* + * Check if any bits in the aqmask have been set which will + * result in queues being removed from non-default drivers + */ + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) { + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved, + __verify_queue_reservations); + if (rc) + return rc; + } + + memcpy(ap_perms.aqm, newaqm, AQMASKSIZE); + + return 0; +} + static ssize_t aqmask_store(struct bus_type *bus, const char *buf, size_t count) { int rc; + unsigned long newaqm[BITS_TO_LONGS(AP_DEVICES)]; - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex); + if (mutex_lock_interruptible(&ap_perms_mutex)) + return -ERESTARTSYS; + + memcpy(newaqm, ap_perms.aqm, AQMASKSIZE); + + rc = ap_parse_mask_str(buf, newaqm, AP_DOMAINS, NULL); + if (rc) + goto done; + + rc = aqmask_commit(newaqm); + if (rc) + goto done; + +done: + mutex_unlock(&ap_perms_mutex); if (rc) return rc; diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h index 4348fdff1c61..86bf5e224ba4 100644 --- a/drivers/s390/crypto/ap_bus.h +++ b/drivers/s390/crypto/ap_bus.h @@ -138,6 +138,7 @@ struct ap_driver { void (*remove)(struct ap_device *); void (*suspend)(struct ap_device *); void (*resume)(struct ap_device *); + bool (*in_use)(unsigned long *apm, unsigned long *aqm); }; #define to_ap_drv(x) container_of((x), struct ap_driver, driver) @@ -266,6 +267,9 @@ void ap_queue_init_state(struct ap_queue *aq); struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type, int comp_device_type, unsigned int functions); +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long)) +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long)) + struct ap_perms { unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)]; unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
Introduces a new driver callback to prevent a root user from unbinding an AP queue from its device driver if the queue is in use. The intent of this callback is to provide a driver with the means to prevent a root user from inadvertently taking a queue away from a guest and giving it to the host while the guest is still using it. The callback will be invoked whenever a change to the AP bus's sysfs apmask or aqmask attributes would result in one or more AP queues being removed from its driver. If the callback responds in the affirmative for any driver queried, the change to the apmask or aqmask will be rejected with a device in use error. For this patch, only non-default drivers will be queried. Currently, there is only one non-default driver, the vfio_ap device driver. The vfio_ap device driver manages AP queues passed through to one or more guests and we don't want to unexpectedly take AP resources away from guests which are most likely independently administered. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> --- drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++-- drivers/s390/crypto/ap_bus.h | 4 + 2 files changed, 142 insertions(+), 6 deletions(-)