diff mbox series

[04/10] vfio: Move storage of allow_unsafe_interrupts to vfio_main.c

Message ID 4-v1-4991695894d8+211-vfio_iommufd_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Connect VFIO to IOMMUFD | expand

Commit Message

Jason Gunthorpe Oct. 25, 2022, 6:17 p.m. UTC
This legacy module knob has become uAPI, when set on the vfio_iommu_type1
it disables some security protections in the iommu drivers. Move the
storage for this knob to vfio_main.c so that iommufd can access it too.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.h             | 2 ++
 drivers/vfio/vfio_iommu_type1.c | 5 ++---
 drivers/vfio/vfio_main.c        | 3 +++
 3 files changed, 7 insertions(+), 3 deletions(-)

Comments

Alex Williamson Oct. 26, 2022, 9:24 p.m. UTC | #1
On Tue, 25 Oct 2022 15:17:10 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> This legacy module knob has become uAPI, when set on the vfio_iommu_type1
> it disables some security protections in the iommu drivers. Move the
> storage for this knob to vfio_main.c so that iommufd can access it too.

I don't really understand this, we're changing the behavior of the
iommufd_device_attach() operation based on the modules options of
vfio_iommu_type1, which may not be loaded or even compiled into the
kernel.  Our compatibility story falls apart when VFIO_CONTAINER is not
set, iommufd sneaks in to usurp /dev/vfio/vfio, and the user's module
options for type1 go unprocessed.

I hate to suggest that type1 becomes a module that does nothing more
than maintain consistency of this variable when the full type1 isn't
available, but is that what we need to do?  Thanks,

Alex

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.h             | 2 ++
>  drivers/vfio/vfio_iommu_type1.c | 5 ++---
>  drivers/vfio/vfio_main.c        | 3 +++
>  3 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index f95f4925b83bbd..54e5a8e0834ccb 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -130,4 +130,6 @@ extern bool vfio_noiommu __read_mostly;
>  enum { vfio_noiommu = false };
>  #endif
>  
> +extern bool vfio_allow_unsafe_interrupts;
> +
>  #endif
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 23c24fe98c00d4..186e33a006d314 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -44,9 +44,8 @@
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>  #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
>  
> -static bool allow_unsafe_interrupts;
>  module_param_named(allow_unsafe_interrupts,
> -		   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> +		   vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(allow_unsafe_interrupts,
>  		 "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
>  
> @@ -2282,7 +2281,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    iommu_group_for_each_dev(iommu_group, (void *)IOMMU_CAP_INTR_REMAP,
>  					     vfio_iommu_device_capable);
>  
> -	if (!allow_unsafe_interrupts && !msi_remap) {
> +	if (!vfio_allow_unsafe_interrupts && !msi_remap) {
>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>  		       __func__);
>  		ret = -EPERM;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8d809ecd982b39..1e414b2c48a511 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -51,6 +51,9 @@ static struct vfio {
>  	struct ida			device_ida;
>  } vfio;
>  
> +bool vfio_allow_unsafe_interrupts;
> +EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
> +
>  static DEFINE_XARRAY(vfio_device_set_xa);
>  static const struct file_operations vfio_group_fops;
>
Jason Gunthorpe Oct. 28, 2022, 6:40 p.m. UTC | #2
On Wed, Oct 26, 2022 at 03:24:42PM -0600, Alex Williamson wrote:
> On Tue, 25 Oct 2022 15:17:10 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > This legacy module knob has become uAPI, when set on the vfio_iommu_type1
> > it disables some security protections in the iommu drivers. Move the
> > storage for this knob to vfio_main.c so that iommufd can access it too.
> 
> I don't really understand this, we're changing the behavior of the
> iommufd_device_attach() operation based on the modules options of
> vfio_iommu_type1, 

The specific reason it was done is that we had a misconfigured test VM
in the farm that needed it, and that VM has since been fixed. But it
did highlight we should try to preserve this in some way.

> which may not be loaded or even compiled into the
> kernel.  Our compatibility story falls apart when VFIO_CONTAINER is not
> set, iommufd sneaks in to usurp /dev/vfio/vfio, and the user's module
> options for type1 go unprocessed.

There are two aspects here, trying to preseve the
allow_unsafe_interrupts knob as it is already as some ABI in the best
way we can.

And the second is how do we make this work in the new world where
there may be no type 1 module at all. This patch is not trying to
address that topic. I am expecting a range of small adjustments before
VFIO_CONTAINER=n becomes really fully viable.

> I hate to suggest that type1 becomes a module that does nothing more
> than maintain consistency of this variable when the full type1 isn't
> available, but is that what we need to do?

It is one idea, it depends how literal you want to be on "module
parameters are ABI". IMHO it is a weak form of ABI and the need of
this paramter in particular is not that common in modern times, AFAIK.

So perhaps we just also expose it through vfio.ko and expect people to
migrate. That would give a window were both options are available.

Jason
Alex Williamson Oct. 31, 2022, 10:45 p.m. UTC | #3
On Fri, 28 Oct 2022 15:40:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Oct 26, 2022 at 03:24:42PM -0600, Alex Williamson wrote:
> > On Tue, 25 Oct 2022 15:17:10 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > This legacy module knob has become uAPI, when set on the vfio_iommu_type1
> > > it disables some security protections in the iommu drivers. Move the
> > > storage for this knob to vfio_main.c so that iommufd can access it too.  
> > 
> > I don't really understand this, we're changing the behavior of the
> > iommufd_device_attach() operation based on the modules options of
> > vfio_iommu_type1,   
> 
> The specific reason it was done is that we had a misconfigured test VM
> in the farm that needed it, and that VM has since been fixed. But it
> did highlight we should try to preserve this in some way.
> 
> > which may not be loaded or even compiled into the
> > kernel.  Our compatibility story falls apart when VFIO_CONTAINER is not
> > set, iommufd sneaks in to usurp /dev/vfio/vfio, and the user's module
> > options for type1 go unprocessed.  
> 
> There are two aspects here, trying to preseve the
> allow_unsafe_interrupts knob as it is already as some ABI in the best
> way we can.
> 
> And the second is how do we make this work in the new world where
> there may be no type 1 module at all. This patch is not trying to
> address that topic. I am expecting a range of small adjustments before
> VFIO_CONTAINER=n becomes really fully viable.
> 
> > I hate to suggest that type1 becomes a module that does nothing more
> > than maintain consistency of this variable when the full type1 isn't
> > available, but is that what we need to do?  
> 
> It is one idea, it depends how literal you want to be on "module
> parameters are ABI". IMHO it is a weak form of ABI and the need of
> this paramter in particular is not that common in modern times, AFAIK.
> 
> So perhaps we just also expose it through vfio.ko and expect people to
> migrate. That would give a window were both options are available.

That might be best.  Ultimately this is an opt-out of a feature that
has security implications, so I'd rather error on the side of requiring
the user to re-assert that opt-out.  It seems the potential good in
eliminating stale or unnecessary options outweighs any weak claims of
preserving an ABI for a module that's no longer in service.

However, I'd question whether vfio is the right place for that new
module option.  As proposed, vfio is only passing it through to
iommufd, where an error related to lack of the hardware feature is
masked behind an -EPERM by the time it gets back to vfio, making any
sort of advisory to the user about the module option convoluted.  It
seems like iommufd should own the option to opt-out universally, not
just through the vfio use case.  Thanks,

Alex
Jason Gunthorpe Nov. 7, 2022, 1:19 p.m. UTC | #4
On Mon, Oct 31, 2022 at 04:45:26PM -0600, Alex Williamson wrote:

> > It is one idea, it depends how literal you want to be on "module
> > parameters are ABI". IMHO it is a weak form of ABI and the need of
> > this paramter in particular is not that common in modern times, AFAIK.
> > 
> > So perhaps we just also expose it through vfio.ko and expect people to
> > migrate. That would give a window were both options are available.
> 
> That might be best.  Ultimately this is an opt-out of a feature that
> has security implications, so I'd rather error on the side of requiring
> the user to re-assert that opt-out.  It seems the potential good in
> eliminating stale or unnecessary options outweighs any weak claims of
> preserving an ABI for a module that's no longer in service.

Ok, lets do this

--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -55,6 +55,11 @@ static struct vfio {
 bool vfio_allow_unsafe_interrupts;
 EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
 
+module_param_named(allow_unsafe_interrupts,
+                  vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(allow_unsafe_interrupts,
+                "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
+
 static DEFINE_XARRAY(vfio_device_set_xa);
 static const struct file_operations vfio_group_fops;

> However, I'd question whether vfio is the right place for that new
> module option.  As proposed, vfio is only passing it through to
> iommufd, where an error related to lack of the hardware feature is
> masked behind an -EPERM by the time it gets back to vfio, making any
> sort of advisory to the user about the module option convoluted.  It
> seems like iommufd should own the option to opt-out universally, not
> just through the vfio use case.  Thanks,

My thinking is this option shouldn't exist at all in other iommufd
users. eg I don't see value in VDPA supporting it.

So, let's wait and see if a need arises first. I'm reluctant to add
options to disable kernel security without really good reasons.

Jason
Alex Williamson Nov. 7, 2022, 3:18 p.m. UTC | #5
On Mon, 7 Nov 2022 09:19:43 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 31, 2022 at 04:45:26PM -0600, Alex Williamson wrote:
> 
> > > It is one idea, it depends how literal you want to be on "module
> > > parameters are ABI". IMHO it is a weak form of ABI and the need of
> > > this paramter in particular is not that common in modern times, AFAIK.
> > > 
> > > So perhaps we just also expose it through vfio.ko and expect people to
> > > migrate. That would give a window were both options are available.  
> > 
> > That might be best.  Ultimately this is an opt-out of a feature that
> > has security implications, so I'd rather error on the side of requiring
> > the user to re-assert that opt-out.  It seems the potential good in
> > eliminating stale or unnecessary options outweighs any weak claims of
> > preserving an ABI for a module that's no longer in service.  
> 
> Ok, lets do this
> 
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -55,6 +55,11 @@ static struct vfio {
>  bool vfio_allow_unsafe_interrupts;
>  EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
>  
> +module_param_named(allow_unsafe_interrupts,
> +                  vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(allow_unsafe_interrupts,
> +                "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
> +
>  static DEFINE_XARRAY(vfio_device_set_xa);
>  static const struct file_operations vfio_group_fops;
> 
> > However, I'd question whether vfio is the right place for that new
> > module option.  As proposed, vfio is only passing it through to
> > iommufd, where an error related to lack of the hardware feature is
> > masked behind an -EPERM by the time it gets back to vfio, making any
> > sort of advisory to the user about the module option convoluted.  It
> > seems like iommufd should own the option to opt-out universally, not
> > just through the vfio use case.  Thanks,  
> 
> My thinking is this option shouldn't exist at all in other iommufd
> users. eg I don't see value in VDPA supporting it.

I disagree, the IOMMU interface is responsible for isolating the
device, this option doesn't make any sense to live in vfio-main, which
is the reason it was always a type1 option.  If vdpa doesn't allow full
device access such that it can guarantee that a device cannot generate
a DMA that can spoof MSI, then it sounds like the flag we pass when
attaching a device to iommfd should to reflect this difference in usage.
The driver either requires full isolation, default, or can indicate a
form of restricted DMA programming that prevents interrupt spoofing.
The policy whether to permit unsafe configurations should exist in one
place, iommufd.  Thanks,

Alex
Jason Gunthorpe Nov. 7, 2022, 3:32 p.m. UTC | #6
On Mon, Nov 07, 2022 at 08:18:53AM -0700, Alex Williamson wrote:
> On Mon, 7 Nov 2022 09:19:43 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Mon, Oct 31, 2022 at 04:45:26PM -0600, Alex Williamson wrote:
> > 
> > > > It is one idea, it depends how literal you want to be on "module
> > > > parameters are ABI". IMHO it is a weak form of ABI and the need of
> > > > this paramter in particular is not that common in modern times, AFAIK.
> > > > 
> > > > So perhaps we just also expose it through vfio.ko and expect people to
> > > > migrate. That would give a window were both options are available.  
> > > 
> > > That might be best.  Ultimately this is an opt-out of a feature that
> > > has security implications, so I'd rather error on the side of requiring
> > > the user to re-assert that opt-out.  It seems the potential good in
> > > eliminating stale or unnecessary options outweighs any weak claims of
> > > preserving an ABI for a module that's no longer in service.  
> > 
> > Ok, lets do this
> > 
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -55,6 +55,11 @@ static struct vfio {
> >  bool vfio_allow_unsafe_interrupts;
> >  EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
> >  
> > +module_param_named(allow_unsafe_interrupts,
> > +                  vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> > +MODULE_PARM_DESC(allow_unsafe_interrupts,
> > +                "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
> > +
> >  static DEFINE_XARRAY(vfio_device_set_xa);
> >  static const struct file_operations vfio_group_fops;
> > 
> > > However, I'd question whether vfio is the right place for that new
> > > module option.  As proposed, vfio is only passing it through to
> > > iommufd, where an error related to lack of the hardware feature is
> > > masked behind an -EPERM by the time it gets back to vfio, making any
> > > sort of advisory to the user about the module option convoluted.  It
> > > seems like iommufd should own the option to opt-out universally, not
> > > just through the vfio use case.  Thanks,  
> > 
> > My thinking is this option shouldn't exist at all in other iommufd
> > users. eg I don't see value in VDPA supporting it.
> 
> I disagree, the IOMMU interface is responsible for isolating the
> device, this option doesn't make any sense to live in vfio-main, which
> is the reason it was always a type1 option.  

You just agreed to this above?

> If vdpa doesn't allow full device access such that it can guarantee
> that a device cannot generate a DMA that can spoof MSI, then it
> sounds like the flag we pass when attaching a device to iommfd
> should to reflect this difference in usage.

VDPA allows arbitary DMA just like VFIO. At most VDPA limits the MMIO
touches.

> The driver either requires full isolation, default, or can indicate a
> form of restricted DMA programming that prevents interrupt spoofing.
> The policy whether to permit unsafe configurations should exist in one
> place, iommufd.

iommufd doesn't know the level of unsafely the external driver is
creating, and IMHO we don't actually want to enable this more
widely. So I don't want to see a global kernel wide flag at this point
until we get reason to make more than just VFIO insecure.

Jason
Alex Williamson Nov. 7, 2022, 6:05 p.m. UTC | #7
On Mon, 7 Nov 2022 11:32:40 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Nov 07, 2022 at 08:18:53AM -0700, Alex Williamson wrote:
> > On Mon, 7 Nov 2022 09:19:43 -0400
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Mon, Oct 31, 2022 at 04:45:26PM -0600, Alex Williamson wrote:
> > >   
> > > > > It is one idea, it depends how literal you want to be on "module
> > > > > parameters are ABI". IMHO it is a weak form of ABI and the need of
> > > > > this paramter in particular is not that common in modern times, AFAIK.
> > > > > 
> > > > > So perhaps we just also expose it through vfio.ko and expect people to
> > > > > migrate. That would give a window were both options are available.    
> > > > 
> > > > That might be best.  Ultimately this is an opt-out of a feature that
> > > > has security implications, so I'd rather error on the side of requiring
> > > > the user to re-assert that opt-out.  It seems the potential good in
> > > > eliminating stale or unnecessary options outweighs any weak claims of
> > > > preserving an ABI for a module that's no longer in service.    
> > > 
> > > Ok, lets do this
> > > 
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -55,6 +55,11 @@ static struct vfio {
> > >  bool vfio_allow_unsafe_interrupts;
> > >  EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
> > >  
> > > +module_param_named(allow_unsafe_interrupts,
> > > +                  vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> > > +MODULE_PARM_DESC(allow_unsafe_interrupts,
> > > +                "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
> > > +
> > >  static DEFINE_XARRAY(vfio_device_set_xa);
> > >  static const struct file_operations vfio_group_fops;
> > >   
> > > > However, I'd question whether vfio is the right place for that new
> > > > module option.  As proposed, vfio is only passing it through to
> > > > iommufd, where an error related to lack of the hardware feature is
> > > > masked behind an -EPERM by the time it gets back to vfio, making any
> > > > sort of advisory to the user about the module option convoluted.  It
> > > > seems like iommufd should own the option to opt-out universally, not
> > > > just through the vfio use case.  Thanks,    
> > > 
> > > My thinking is this option shouldn't exist at all in other iommufd
> > > users. eg I don't see value in VDPA supporting it.  
> > 
> > I disagree, the IOMMU interface is responsible for isolating the
> > device, this option doesn't make any sense to live in vfio-main, which
> > is the reason it was always a type1 option.    
> 
> You just agreed to this above?

After further consideration... I don't think the option on vfio-main
makes sense, basically for the same reason that the original option
existed on the IOMMU backend rather than vfio-core.  The option
describes a means to relax a specific aspect of IOMMU isolation, which
makes more sense to expose via the IOMMU provider, imo.  For example,
vfio-main cannot generate an equivalent error message as provided in
type1 today, it's too far removed from the IOMMU feature support.

> > If vdpa doesn't allow full device access such that it can guarantee
> > that a device cannot generate a DMA that can spoof MSI, then it
> > sounds like the flag we pass when attaching a device to iommfd
> > should to reflect this difference in usage.  
> 
> VDPA allows arbitary DMA just like VFIO. At most VDPA limits the MMIO
> touches.

So why exactly isn't this an issue for VDPA?  Are we just burying our
head in the sand that such platforms exists and can still be useful
given the appropriate risk vs reward trade-off?

> > The driver either requires full isolation, default, or can indicate a
> > form of restricted DMA programming that prevents interrupt spoofing.
> > The policy whether to permit unsafe configurations should exist in one
> > place, iommufd.  
> 
> iommufd doesn't know the level of unsafely the external driver is
> creating,

Thus the proposed flag.  But maybe we don't need it if VDPA has no
inherent protection against MSI spoofing itself.

> and IMHO we don't actually want to enable this more
> widely. So I don't want to see a global kernel wide flag at this point
> until we get reason to make more than just VFIO insecure.

But this brings into question the entire existence of the opt-in.  Do
we agree that there are valid use cases for such an option?

Unlike things like ACS overrides, lack of interrupt isolation really
requires a malicious actor.  We're not going to inadvertently overlap
DMA to interrupt addresses like we might to a non-isolated MMIO ranges.
Therefore an admin can make a reasonable determination relative to the
extent to which the userspace is trusted.  This is not unlike opt-outs
to CPU vulnerability mitigation imo, there are use cases where the
performance or functionality is more important than the isolation.
Hand waving this away as a vfio-unique insecurity is a bad precedent
for iommufd.  Thanks,

Alex
Jason Gunthorpe Nov. 7, 2022, 6:45 p.m. UTC | #8
On Mon, Nov 07, 2022 at 11:05:08AM -0700, Alex Williamson wrote:

> After further consideration... I don't think the option on vfio-main
> makes sense, basically for the same reason that the original option
> existed on the IOMMU backend rather than vfio-core.  The option
> describes a means to relax a specific aspect of IOMMU isolation, which
> makes more sense to expose via the IOMMU provider, imo.  For example,
> vfio-main cannot generate an equivalent error message as provided in
> type1 today, it's too far removed from the IOMMU feature support.

vfio-main can do it, we just have to be strict that the EPERM code is
always going to be this case.
 
> > > If vdpa doesn't allow full device access such that it can guarantee
> > > that a device cannot generate a DMA that can spoof MSI, then it
> > > sounds like the flag we pass when attaching a device to iommfd
> > > should to reflect this difference in usage.  
> > 
> > VDPA allows arbitary DMA just like VFIO. At most VDPA limits the MMIO
> > touches.
>
> So why exactly isn't this an issue for VDPA?  Are we just burying our
> head in the sand that such platforms exists and can still be useful
> given the appropriate risk vs reward trade-off?

Simply that nobody has asked for it, and might never ask for it. This
is all support for old platforms, and there just doesn't seem to be a
"real" use case for very new (and actually rare) NIC hardware stuck
into ancient platforms with this security problem.

So I'd rather leave this in the past than carry forward a security
exception as some ongoing 1st class thing.

> > and IMHO we don't actually want to enable this more
> > widely. So I don't want to see a global kernel wide flag at this point
> > until we get reason to make more than just VFIO insecure.
> 
> But this brings into question the entire existence of the opt-in.  Do
> we agree that there are valid use cases for such an option?

I think it is something VFIO has historically allowed and I think we
can continue to allow it, but I don't think we should encourage its
use or encourage it to propogate to wider areas given that the
legitimate use cases are focused on fairly old hardware at this point.

So, I'd rather wait for someone to ask for it, and explain why they
need to use a combination of stuff where we need to have a true global
option.

> Unlike things like ACS overrides, lack of interrupt isolation really
> requires a malicious actor.  We're not going to inadvertently overlap
> DMA to interrupt addresses like we might to a non-isolated MMIO ranges.
> Therefore an admin can make a reasonable determination relative to the
> extent to which the userspace is trusted.  This is not unlike opt-outs
> to CPU vulnerability mitigation imo, there are use cases where the
> performance or functionality is more important than the isolation.
> Hand waving this away as a vfio-unique insecurity is a bad precedent
> for iommufd.

I agree with this, which is why I think it should come from the actual
user facing subsystem not be a system wide flag. The "is userspace
trusted" for VFIO may be quite different than from VDPA or whatever
else comes next.

I'd be much more comfortable with this as a system wide iommufd flag
if we also tied it to do some demonstration of privilege - eg a
requirement to open iommufd with CAP_SYS_RAWIO for instance.

That is the usual protocol for these kinds of insecurities..

I think right now we can leave this as-is and we can wait for some
more information to decide how best to proceed.

Thanks,
Jason
Alex Williamson Nov. 8, 2022, 10:55 p.m. UTC | #9
On Mon, 7 Nov 2022 14:45:59 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Nov 07, 2022 at 11:05:08AM -0700, Alex Williamson wrote:
> 
> > After further consideration... I don't think the option on vfio-main
> > makes sense, basically for the same reason that the original option
> > existed on the IOMMU backend rather than vfio-core.  The option
> > describes a means to relax a specific aspect of IOMMU isolation, which
> > makes more sense to expose via the IOMMU provider, imo.  For example,
> > vfio-main cannot generate an equivalent error message as provided in
> > type1 today, it's too far removed from the IOMMU feature support.  
> 
> vfio-main can do it, we just have to be strict that the EPERM code is
> always going to be this case.

This doesn't seem very practical.

> > > > If vdpa doesn't allow full device access such that it can guarantee
> > > > that a device cannot generate a DMA that can spoof MSI, then it
> > > > sounds like the flag we pass when attaching a device to iommfd
> > > > should to reflect this difference in usage.    
> > > 
> > > VDPA allows arbitary DMA just like VFIO. At most VDPA limits the MMIO
> > > touches.  
> >
> > So why exactly isn't this an issue for VDPA?  Are we just burying our
> > head in the sand that such platforms exists and can still be useful
> > given the appropriate risk vs reward trade-off?  
> 
> Simply that nobody has asked for it, and might never ask for it. This
> is all support for old platforms, and there just doesn't seem to be a
> "real" use case for very new (and actually rare) NIC hardware stuck
> into ancient platforms with this security problem.

vIOMMU support for interrupt remapping is relatively new, the nesting
case is important as well.

> So I'd rather leave this in the past than carry forward a security
> exception as some ongoing 1st class thing.
> 
> > > and IMHO we don't actually want to enable this more
> > > widely. So I don't want to see a global kernel wide flag at this point
> > > until we get reason to make more than just VFIO insecure.  
> > 
> > But this brings into question the entire existence of the opt-in.  Do
> > we agree that there are valid use cases for such an option?  
> 
> I think it is something VFIO has historically allowed and I think we
> can continue to allow it, but I don't think we should encourage its
> use or encourage it to propogate to wider areas given that the
> legitimate use cases are focused on fairly old hardware at this point.
> 
> So, I'd rather wait for someone to ask for it, and explain why they
> need to use a combination of stuff where we need to have a true global
> option.
> 
> > Unlike things like ACS overrides, lack of interrupt isolation really
> > requires a malicious actor.  We're not going to inadvertently overlap
> > DMA to interrupt addresses like we might to a non-isolated MMIO ranges.
> > Therefore an admin can make a reasonable determination relative to the
> > extent to which the userspace is trusted.  This is not unlike opt-outs
> > to CPU vulnerability mitigation imo, there are use cases where the
> > performance or functionality is more important than the isolation.
> > Hand waving this away as a vfio-unique insecurity is a bad precedent
> > for iommufd.  
> 
> I agree with this, which is why I think it should come from the actual
> user facing subsystem not be a system wide flag. The "is userspace
> trusted" for VFIO may be quite different than from VDPA or whatever
> else comes next.
> 
> I'd be much more comfortable with this as a system wide iommufd flag
> if we also tied it to do some demonstration of privilege - eg a
> requirement to open iommufd with CAP_SYS_RAWIO for instance.

Which is not compatible to existing use cases, which is also why we
can't invent some way to allow some applications to run without CPU
mitigations, while requiring it for others as a baseline.

> That is the usual protocol for these kinds of insecurities..

Hmm, is it?

> I think right now we can leave this as-is and we can wait for some
> more information to decide how best to proceed.

It's certainly not acceptable in the latest proposal, iommufd consumes
an option set by another module and when that module goes away, so does
any claim of compatibility.  The code becomes dead and the feature not
present.  The option doesn't belong on the vfio module.  Do we need a
vfio-iommufd module to host it?  Thanks,

Alex
Jason Gunthorpe Nov. 9, 2022, 1:05 a.m. UTC | #10
On Tue, Nov 08, 2022 at 03:55:20PM -0700, Alex Williamson wrote:

> > > So why exactly isn't this an issue for VDPA?  Are we just burying our
> > > head in the sand that such platforms exists and can still be useful
> > > given the appropriate risk vs reward trade-off?  
> > 
> > Simply that nobody has asked for it, and might never ask for it. This
> > is all support for old platforms, and there just doesn't seem to be a
> > "real" use case for very new (and actually rare) NIC hardware stuck
> > into ancient platforms with this security problem.
> 
> vIOMMU support for interrupt remapping is relatively new, the nesting
> case is important as well.

This is where we got hit. In the end we fixed the qemu..

> > I'd be much more comfortable with this as a system wide iommufd flag
> > if we also tied it to do some demonstration of privilege - eg a
> > requirement to open iommufd with CAP_SYS_RAWIO for instance.
> 
> Which is not compatible to existing use cases, which is also why we
> can't invent some way to allow some applications to run without CPU
> mitigations, while requiring it for others as a baseline.

Isn't it? Didn't we learn that libvirt runs as root and will open and
pass the iommufd as root?

> > That is the usual protocol for these kinds of insecurities..
> 
> Hmm, is it?

I think so. At least you should have something to shut down an
insecure feature in kernel lockdown modes. CAP_SYS_RAWIO is a simple
way to do it.

> > I think right now we can leave this as-is and we can wait for some
> > more information to decide how best to proceed.
> 
> It's certainly not acceptable in the latest proposal, iommufd consumes
> an option set by another module and when that module goes away, so does
> any claim of compatibility.  The code becomes dead and the feature not
> present.  The option doesn't belong on the vfio module.  Do we need a
> vfio-iommufd module to host it?  Thanks,

I don't know, as I said in the other email, these little things need
work and discussion to resolve. We need to recheck the security stuff
against the 2022 kernel where things have changed. We don't need to do
it all right now.

People who want allow_unsafe_interrupts to work will simply not set
VFIO_CONTAINER=n at this time. Same with P2P, vfio-no-iommu and any
other gaps we haven't discovered.

vfio-iommufd seems like overkill, I think your first suggestion to put
in vfio.ko was more practical.

My only doubt is if we should make it system wide for everything - and
I'm just a bit uncomfortable with that from a security POV. But maybe
I don't quite know exactly what the risks are.

Jason
Tian, Kevin Nov. 9, 2022, 3:21 a.m. UTC | #11
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, November 9, 2022 9:05 AM
> 
> On Tue, Nov 08, 2022 at 03:55:20PM -0700, Alex Williamson wrote:
> 
> > > > So why exactly isn't this an issue for VDPA?  Are we just burying our
> > > > head in the sand that such platforms exists and can still be useful
> > > > given the appropriate risk vs reward trade-off?
> > >
> > > Simply that nobody has asked for it, and might never ask for it. This
> > > is all support for old platforms, and there just doesn't seem to be a
> > > "real" use case for very new (and actually rare) NIC hardware stuck
> > > into ancient platforms with this security problem.
> >
> > vIOMMU support for interrupt remapping is relatively new, the nesting
> > case is important as well.
> 
> This is where we got hit. In the end we fixed the qemu..

but the point is that old qemu could have a much longer lifespan than
old platforms then when running newer kernel which supports vdpa
on old qemu the same tradeoff consideration is then not vfio specific...

> > It's certainly not acceptable in the latest proposal, iommufd consumes
> > an option set by another module and when that module goes away, so
> does
> > any claim of compatibility.  The code becomes dead and the feature not
> > present.  The option doesn't belong on the vfio module.  Do we need a
> > vfio-iommufd module to host it?  Thanks,
> 
> I don't know, as I said in the other email, these little things need
> work and discussion to resolve. We need to recheck the security stuff
> against the 2022 kernel where things have changed. We don't need to do
> it all right now.
> 
> People who want allow_unsafe_interrupts to work will simply not set
> VFIO_CONTAINER=n at this time. Same with P2P, vfio-no-iommu and any
> other gaps we haven't discovered.
> 

If all agree that VFIO_CONTAINER=n is a process to evolve, does it make
more sense to remove this patch from this series i.e. let it buried in
VFIO_CONTAINER=y for now? Then resolve it in a follow up patch if
no consensus can be made quickly at this point.
Jason Gunthorpe Nov. 9, 2022, 1:11 p.m. UTC | #12
On Wed, Nov 09, 2022 at 03:21:29AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, November 9, 2022 9:05 AM
> > 
> > On Tue, Nov 08, 2022 at 03:55:20PM -0700, Alex Williamson wrote:
> > 
> > > > > So why exactly isn't this an issue for VDPA?  Are we just burying our
> > > > > head in the sand that such platforms exists and can still be useful
> > > > > given the appropriate risk vs reward trade-off?
> > > >
> > > > Simply that nobody has asked for it, and might never ask for it. This
> > > > is all support for old platforms, and there just doesn't seem to be a
> > > > "real" use case for very new (and actually rare) NIC hardware stuck
> > > > into ancient platforms with this security problem.
> > >
> > > vIOMMU support for interrupt remapping is relatively new, the nesting
> > > case is important as well.
> > 
> > This is where we got hit. In the end we fixed the qemu..
> 
> but the point is that old qemu could have a much longer lifespan than
> old platforms then when running newer kernel which supports vdpa
> on old qemu the same tradeoff consideration is then not vfio
> specific...

I think we are reaching into incredible hypotheticals here. We don't
know of any real uses cases where a very new VDPA capable device would
be assinged into a VM using an old qemu and the entire system is
expected to work. What we are seeing is people using this stuff are
making highly engineered systems and will meet the constraints.

Today VDPA doesn't support allow_unsafe_interrupts, and I don't see a
compelling reason to change that.

The threshold for introducing a kernel security hole should be much
higher than "someone could possibly do this".

> If all agree that VFIO_CONTAINER=n is a process to evolve, does it make
> more sense to remove this patch from this series i.e. let it buried in
> VFIO_CONTAINER=y for now? Then resolve it in a follow up patch if
> no consensus can be made quickly at this point.

This is worse, it would make iommufd completely unusable in situations
where we need allow_unsafe_interrupts. If we belive that is important
we should keep this patch so existing systems on kernels with
VFIO_CONTAINER=y continue to work after libvirt/qemu are upgraded to
iommufd.

Jason
Alex Williamson Nov. 9, 2022, 6:28 p.m. UTC | #13
On Tue, 8 Nov 2022 21:05:21 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Nov 08, 2022 at 03:55:20PM -0700, Alex Williamson wrote:
> 
> > > > So why exactly isn't this an issue for VDPA?  Are we just burying our
> > > > head in the sand that such platforms exists and can still be useful
> > > > given the appropriate risk vs reward trade-off?    
> > > 
> > > Simply that nobody has asked for it, and might never ask for it. This
> > > is all support for old platforms, and there just doesn't seem to be a
> > > "real" use case for very new (and actually rare) NIC hardware stuck
> > > into ancient platforms with this security problem.  
> > 
> > vIOMMU support for interrupt remapping is relatively new, the nesting
> > case is important as well.  
> 
> This is where we got hit. In the end we fixed the qemu..
> 
> > > I'd be much more comfortable with this as a system wide iommufd flag
> > > if we also tied it to do some demonstration of privilege - eg a
> > > requirement to open iommufd with CAP_SYS_RAWIO for instance.  
> > 
> > Which is not compatible to existing use cases, which is also why we
> > can't invent some way to allow some applications to run without CPU
> > mitigations, while requiring it for others as a baseline.  
> 
> Isn't it? Didn't we learn that libvirt runs as root and will open and
> pass the iommufd as root?

We're jumping ahead to native iommufd support here, what happens when
VFIO_CONTAINER=n and it's QEMU opening the fds, with only file access
privileges?

> > > That is the usual protocol for these kinds of insecurities..  
> > 
> > Hmm, is it?  
> 
> I think so. At least you should have something to shut down an
> insecure feature in kernel lockdown modes. CAP_SYS_RAWIO is a simple
> way to do it.

How are CPU vulnerabilities handled in lockdown mode, do apps require
certain capabilities to run fast vs safe, or do we simply disallow
unsafe globally in lockdown?  I think we have a lot more leniency to
ignore/disallow flags that enable global insecurities when any sort of
lockdown is imposed.

> > > I think right now we can leave this as-is and we can wait for some
> > > more information to decide how best to proceed.  
> > 
> > It's certainly not acceptable in the latest proposal, iommufd consumes
> > an option set by another module and when that module goes away, so does
> > any claim of compatibility.  The code becomes dead and the feature not
> > present.  The option doesn't belong on the vfio module.  Do we need a
> > vfio-iommufd module to host it?  Thanks,  
> 
> I don't know, as I said in the other email, these little things need
> work and discussion to resolve. We need to recheck the security stuff
> against the 2022 kernel where things have changed. We don't need to do
> it all right now.
> 
> People who want allow_unsafe_interrupts to work will simply not set
> VFIO_CONTAINER=n at this time. Same with P2P, vfio-no-iommu and any
> other gaps we haven't discovered.
> 
> vfio-iommufd seems like overkill, I think your first suggestion to put
> in vfio.ko was more practical.

Convenient perhaps, but architecturally the wrong place for it.

> My only doubt is if we should make it system wide for everything - and
> I'm just a bit uncomfortable with that from a security POV. But maybe
> I don't quite know exactly what the risks are.

There's a paper about these sorts of attacks here[1].  As I noted
earlier, a non-malicious DMA targeting an address that would trigger an
interrupt is extremely unlikely, and the resulting vulnerability is
largely more of a denial of service, IIRC.  It would certainly be
strongly dis-recommended in any scenario where the userspace drivers
are untrusted, such as a cloud hosting provider, but there are
certainly other scenarios where either the guest or userspace drivers
are also under the control of the hosting provider and this is not such
a concern.  Thanks,

Alex

[1]https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf
Tian, Kevin Nov. 10, 2022, 2:44 a.m. UTC | #14
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, November 9, 2022 9:11 PM
> 
> > If all agree that VFIO_CONTAINER=n is a process to evolve, does it make
> > more sense to remove this patch from this series i.e. let it buried in
> > VFIO_CONTAINER=y for now? Then resolve it in a follow up patch if
> > no consensus can be made quickly at this point.
> 
> This is worse, it would make iommufd completely unusable in situations
> where we need allow_unsafe_interrupts. If we belive that is important
> we should keep this patch so existing systems on kernels with
> VFIO_CONTAINER=y continue to work after libvirt/qemu are upgraded to
> iommufd.
> 

You are right. I kept a wrong thought that v2 has moved the option into
vfio_main which is what I commented to hold before consensus was made.

btw is it a good tradeoff by making vfio-compat as a module to carry this
option? anyway it's not necessarily to be in iommufd core when VFIO=n.
Jason Gunthorpe Nov. 10, 2022, 7:19 p.m. UTC | #15
On Wed, Nov 09, 2022 at 11:28:22AM -0700, Alex Williamson wrote:
> > > > I'd be much more comfortable with this as a system wide iommufd flag
> > > > if we also tied it to do some demonstration of privilege - eg a
> > > > requirement to open iommufd with CAP_SYS_RAWIO for instance.  
> > > 
> > > Which is not compatible to existing use cases, which is also why we
> > > can't invent some way to allow some applications to run without CPU
> > > mitigations, while requiring it for others as a baseline.  
> > 
> > Isn't it? Didn't we learn that libvirt runs as root and will open and
> > pass the iommufd as root?
> 
> We're jumping ahead to native iommufd support here, what happens when
> VFIO_CONTAINER=n and it's QEMU opening the fds, with only file access
> privileges?

Yes, but I am thinking aloud about how to best to do this in native
iommufd modes.

> > I think so. At least you should have something to shut down an
> > insecure feature in kernel lockdown modes. CAP_SYS_RAWIO is a simple
> > way to do it.
> 
> How are CPU vulnerabilities handled in lockdown mode, do apps require
> certain capabilities to run fast vs safe, or do we simply disallow
> unsafe globally in lockdown?  I think we have a lot more leniency to
> ignore/disallow flags that enable global insecurities when any sort of
> lockdown is imposed.

The CPU things are all information leaks from the kernel to
userspace. lockdown is about preserving kernel operating integrity -
eg preventing modification of hijacking of the running kernel.

So, like you say below, this is kind of in between, it is not
information leakage, and it is is hopefully not an integrity issue.

Being more of a DOS maybe it is fine under the lockdown scenarios. At
least I am happier to hear that.

> > vfio-iommufd seems like overkill, I think your first suggestion to put
> > in vfio.ko was more practical.
> 
> Convenient perhaps, but architecturally the wrong place for it.

Ah, that is pretty subjective. If the architecture is that the iommufd
user subsystem opts-in to this insecurity then it is an OK place

If it is that iommufd sets it globaly for the whole system it is the
wrong place.

We could also talk about a per-vfio_device sysfs to control this? Then
we can make the sysfs only appear for vfio_devices using the
iommu_domain part of iommufd/vfio. That has a nice sort of compat
shape as we can make the existing module option default the sysfs to
insecure

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f95f4925b83bbd..54e5a8e0834ccb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -130,4 +130,6 @@  extern bool vfio_noiommu __read_mostly;
 enum { vfio_noiommu = false };
 #endif
 
+extern bool vfio_allow_unsafe_interrupts;
+
 #endif
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe98c00d4..186e33a006d314 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -44,9 +44,8 @@ 
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
 
-static bool allow_unsafe_interrupts;
 module_param_named(allow_unsafe_interrupts,
-		   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
+		   vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(allow_unsafe_interrupts,
 		 "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
 
@@ -2282,7 +2281,7 @@  static int vfio_iommu_type1_attach_group(void *iommu_data,
 		    iommu_group_for_each_dev(iommu_group, (void *)IOMMU_CAP_INTR_REMAP,
 					     vfio_iommu_device_capable);
 
-	if (!allow_unsafe_interrupts && !msi_remap) {
+	if (!vfio_allow_unsafe_interrupts && !msi_remap) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8d809ecd982b39..1e414b2c48a511 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -51,6 +51,9 @@  static struct vfio {
 	struct ida			device_ida;
 } vfio;
 
+bool vfio_allow_unsafe_interrupts;
+EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
+
 static DEFINE_XARRAY(vfio_device_set_xa);
 static const struct file_operations vfio_group_fops;