diff mbox

[for,2.8,10/11] Revert "intel_iommu: Throw hw_error on notify_started"

Message ID 20160901022929.GA3558@pxdev.xzpeter.org (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Xu Sept. 1, 2016, 2:29 a.m. UTC
On Wed, Aug 31, 2016 at 10:45:37AM +0800, Jason Wang wrote:
> 
> 
> On 2016年08月30日 11:37, Alex Williamson wrote:
> >On Tue, 30 Aug 2016 11:06:58 +0800
> >Jason Wang <jasowang@redhat.com> wrote:
> >
> >>From: Peter Xu <peterx@redhat.com>
> >>
> >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> >>device IOTLB API will get notified and send invalidation request to
> >>vhost through this notifier.
> >AFAICT this series does not address the original problem for which
> >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> >use case of a device iotlb firing the iommu notifier therefore this
> >change is a regression versus 2.7 since it allows invalid
> >configurations with a physical iommu which will never receive the
> >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> >
> >Alex
> 
> Looking at vfio, it cares about map but vhost only cares about IOTLB
> invalidation. Then I think we probably need another kind of notifier in this
> case to avoid this.

Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
invalidation? If so, we can use the same IOTLB interface as before.
IMHO these two interfaces are not conflicting?

Alex,

Do you mean we should still disallow user from passing through devices
while Intel IOMMU enabled? If so, not sure whether patch below can
solve the issue.

It seems that we need a "name" for either IOMMU notifier
provider/consumer, and we should not allow (provider==Intel &&
consumer==VFIO) happen. In the following case, I added a name for
provider, and VFIO checks it.

--------8<----------


------>8--------

Thanks,

-- peterx

Comments

Alex Williamson Sept. 1, 2016, 2:43 a.m. UTC | #1
[cc +dgibson]

On Thu, 1 Sep 2016 10:29:29 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Aug 31, 2016 at 10:45:37AM +0800, Jason Wang wrote:
> > 
> > 
> > On 2016年08月30日 11:37, Alex Williamson wrote:  
> > >On Tue, 30 Aug 2016 11:06:58 +0800
> > >Jason Wang <jasowang@redhat.com> wrote:
> > >  
> > >>From: Peter Xu <peterx@redhat.com>
> > >>
> > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > >>device IOTLB API will get notified and send invalidation request to
> > >>vhost through this notifier.  
> > >AFAICT this series does not address the original problem for which
> > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > >use case of a device iotlb firing the iommu notifier therefore this
> > >change is a regression versus 2.7 since it allows invalid
> > >configurations with a physical iommu which will never receive the
> > >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> > >
> > >Alex  
> > 
> > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > invalidation. Then I think we probably need another kind of notifier in this
> > case to avoid this.  
> 
> Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> invalidation? If so, we can use the same IOTLB interface as before.
> IMHO these two interfaces are not conflicting?
> 
> Alex,
> 
> Do you mean we should still disallow user from passing through devices
> while Intel IOMMU enabled? If so, not sure whether patch below can
> solve the issue.
> 
> It seems that we need a "name" for either IOMMU notifier
> provider/consumer, and we should not allow (provider==Intel &&
> consumer==VFIO) happen. In the following case, I added a name for
> provider, and VFIO checks it.

Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
notifier is never called for mappings.  There's a whole aspect of
iommu notifiers that intel-iommu simply hasn't bothered to implement.
Don't punish vfio for actually making use of the interface as it was
intended to be used.  AFAICT you're implementing the unmap/invalidation
half, without the actual mapping half of the interface.  It's broken
and incompatible with any iommu notifiers that expect to see both
sides.  Thanks,

Alex


> --------8<----------
> 
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index 883db13..936c2e6 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -725,6 +725,7 @@ static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  }
> 
>  static const MemoryRegionIOMMUOps typhoon_iommu_ops = {
> +    .iommu_type = "typhoon",
>      .translate = typhoon_translate_iommu,
>  };
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 28c31a2..f5e3875 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2347,6 +2347,7 @@ static void vtd_init(IntelIOMMUState *s)
>      memset(s->w1cmask, 0, DMAR_REG_SIZE);
>      memset(s->womask, 0, DMAR_REG_SIZE);
> 
> +    s->iommu_ops.iommu_type = "intel";
>      s->iommu_ops.translate = vtd_iommu_translate;
>      s->iommu_ops.notify_started = vtd_iommu_notify_started;
>      s->root = 0;
> diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
> index 653e711..9cfbb73 100644
> --- a/hw/pci-host/apb.c
> +++ b/hw/pci-host/apb.c
> @@ -323,6 +323,7 @@ static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  }
> 
>  static MemoryRegionIOMMUOps pbm_iommu_ops = {
> +    .iommu_type = "pbm",
>      .translate = pbm_translate_iommu,
>  };
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 6bc4d4d..e3e8739 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -244,6 +244,7 @@ static const VMStateDescription vmstate_spapr_tce_table = {
>  };
> 
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
> +    .iommu_type = "spapr",
>      .translate = spapr_tce_translate_iommu,
>      .get_min_page_size = spapr_tce_get_min_page_size,
>      .notify_started = spapr_tce_notify_started,
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 9c1c04e..4414462 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -443,6 +443,7 @@ static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  }
> 
>  static const MemoryRegionIOMMUOps s390_iommu_ops = {
> +    .iommu_type = "s390",
>      .translate = s390_translate_iommu,
>  };
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b313e7c..317e08b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -441,6 +441,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
> 
> +        if (!strcmp(memory_region_iommu_type(section->mr), "intel")) {
> +            error_report("Device passthrough cannot work with Intel IOMMU");
> +            exit(1);
> +        }
> +
>          trace_vfio_listener_region_add_iommu(iova, end);
>          /*
>           * FIXME: For VFIO iommu types which have KVM acceleration to
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3e4d416..f012f77 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -149,6 +149,8 @@ struct MemoryRegionOps {
>  typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> 
>  struct MemoryRegionIOMMUOps {
> +    /* Type of IOMMU */
> +    const char *iommu_type;
>      /* Return a TLB entry that contains a given address. */
>      IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
>      /* Returns minimum supported page size */
> @@ -593,6 +595,21 @@ static inline bool memory_region_is_iommu(MemoryRegion *mr)
>      return mr->iommu_ops;
>  }
> 
> +/**
> + * memory_region_iommu_type: return type of IOMMU
> + *
> + * Returns type of IOMMU, empty string ("") if not a IOMMU region.
> + *
> + * @mr: the memory region being queried
> + */
> +static inline const char *memory_region_iommu_type(MemoryRegion *mr)
> +{
> +    if (mr->iommu_ops && mr->iommu_ops->iommu_type) {
> +        return mr->iommu_ops->iommu_type;
> +    }
> +
> +    return "";
> +}
> 
>  /**
>   * memory_region_iommu_get_min_page_size: get minimum supported page size
> 
> ------>8--------  
> 
> Thanks,
> 
> -- peterx
Peter Xu Sept. 1, 2016, 3:58 a.m. UTC | #2
On Wed, Aug 31, 2016 at 08:43:42PM -0600, Alex Williamson wrote:
> > > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > > >>device IOTLB API will get notified and send invalidation request to
> > > >>vhost through this notifier.  
> > > >AFAICT this series does not address the original problem for which
> > > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > > >use case of a device iotlb firing the iommu notifier therefore this
> > > >change is a regression versus 2.7 since it allows invalid
> > > >configurations with a physical iommu which will never receive the
> > > >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> > > >
> > > >Alex  
> > > 
> > > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > > invalidation. Then I think we probably need another kind of notifier in this
> > > case to avoid this.  
> > 
> > Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> > invalidation? If so, we can use the same IOTLB interface as before.
> > IMHO these two interfaces are not conflicting?
> > 
> > Alex,
> > 
> > Do you mean we should still disallow user from passing through devices
> > while Intel IOMMU enabled? If so, not sure whether patch below can
> > solve the issue.
> > 
> > It seems that we need a "name" for either IOMMU notifier
> > provider/consumer, and we should not allow (provider==Intel &&
> > consumer==VFIO) happen. In the following case, I added a name for
> > provider, and VFIO checks it.
> 
> Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
> notifier is never called for mappings.  There's a whole aspect of
> iommu notifiers that intel-iommu simply hasn't bothered to implement.
> Don't punish vfio for actually making use of the interface as it was
> intended to be used.  AFAICT you're implementing the unmap/invalidation
> half, without the actual mapping half of the interface.  It's broken
> and incompatible with any iommu notifiers that expect to see both
> sides.  Thanks,

Yeah I think I got your point. Thanks for the explanation.

Now I agree with Jason that we may need another notifier mechanism.

-- peterx
David Gibson Sept. 2, 2016, 4:15 a.m. UTC | #3
On Thu, 1 Sep 2016 11:58:48 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Aug 31, 2016 at 08:43:42PM -0600, Alex Williamson wrote:
> > > > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > > > >>device IOTLB API will get notified and send invalidation request to
> > > > >>vhost through this notifier.    
> > > > >AFAICT this series does not address the original problem for which
> > > > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > > > >use case of a device iotlb firing the iommu notifier therefore this
> > > > >change is a regression versus 2.7 since it allows invalid
> > > > >configurations with a physical iommu which will never receive the
> > > > >necessary notifies from intel-iommu emulation to work properly.  Thanks,
> > > > >
> > > > >Alex    
> > > > 
> > > > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > > > invalidation. Then I think we probably need another kind of notifier in this
> > > > case to avoid this.    
> > > 
> > > Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> > > invalidation? If so, we can use the same IOTLB interface as before.
> > > IMHO these two interfaces are not conflicting?
> > > 
> > > Alex,
> > > 
> > > Do you mean we should still disallow user from passing through devices
> > > while Intel IOMMU enabled? If so, not sure whether patch below can
> > > solve the issue.
> > > 
> > > It seems that we need a "name" for either IOMMU notifier
> > > provider/consumer, and we should not allow (provider==Intel &&
> > > consumer==VFIO) happen. In the following case, I added a name for
> > > provider, and VFIO checks it.  
> > 
> > Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
> > notifier is never called for mappings.  There's a whole aspect of
> > iommu notifiers that intel-iommu simply hasn't bothered to implement.
> > Don't punish vfio for actually making use of the interface as it was
> > intended to be used.  AFAICT you're implementing the unmap/invalidation
> > half, without the actual mapping half of the interface.  It's broken
> > and incompatible with any iommu notifiers that expect to see both
> > sides.  Thanks,  
> 
> Yeah I think I got your point. Thanks for the explanation.
> 
> Now I agree with Jason that we may need another notifier mechanism.

What!?  I see no reason you need a different notifier, just fix the
implementation of the current one.  As a bonus this will also give you
working VFIO passthrough with vIOMMU on x86, something which should
work already, but doesn't.
Peter Xu Sept. 2, 2016, 5:37 a.m. UTC | #4
On Fri, Sep 02, 2016 at 02:15:04PM +1000, David Gibson wrote:
> What!?  I see no reason you need a different notifier, just fix the
> implementation of the current one.  As a bonus this will also give you
> working VFIO passthrough with vIOMMU on x86, something which should
> work already, but doesn't.

Hi, David,

Do you mean that we can enhance the interface to suite the two needs?
E.g., adding a "IOTLB notification type" definition:

- "full": for those who is listening on all mapping changes including
  additions (VFIO use case)

- "cache_only": for those who only cares about cache invalidations
  (device IOTLB, aka, vhost use case)

We can:

- add notify type when we register the notifiers (e.g., when VFIO
  registers IOMMU notifier, it should specify the type as "full", so
  it won't receive notification if it's device IOTLB invalidations).

- pass this type when trigger the notification, so for each IOMMU
  notify handler, it can selectively disgard the notification.

Not sure whether above makes sense.

-- peterx
David Gibson Sept. 2, 2016, 6:10 a.m. UTC | #5
On Fri, 2 Sep 2016 13:37:33 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 02, 2016 at 02:15:04PM +1000, David Gibson wrote:
> > What!?  I see no reason you need a different notifier, just fix the
> > implementation of the current one.  As a bonus this will also give you
> > working VFIO passthrough with vIOMMU on x86, something which should
> > work already, but doesn't.  
> 
> Hi, David,
> 
> Do you mean that we can enhance the interface to suite the two needs?
> E.g., adding a "IOTLB notification type" definition:
> 
> - "full": for those who is listening on all mapping changes including
>   additions (VFIO use case)
> 
> - "cache_only": for those who only cares about cache invalidations
>   (device IOTLB, aka, vhost use case)
> 
> We can:
> 
> - add notify type when we register the notifiers (e.g., when VFIO
>   registers IOMMU notifier, it should specify the type as "full", so
>   it won't receive notification if it's device IOTLB invalidations).
> 
> - pass this type when trigger the notification, so for each IOMMU
>   notify handler, it can selectively disgard the notification.
> 
> Not sure whether above makes sense.

No, implement the full notifier, and a listener which only wants the
invalidates can just ignore callbacks which add new mappings.

As I said, you'll need this to get VFIO working with vIOMMU which
someone is bound to want soon enough anyway.
Peter Xu Sept. 2, 2016, 6:15 a.m. UTC | #6
On Fri, Sep 02, 2016 at 04:10:14PM +1000, David Gibson wrote:
> On Fri, 2 Sep 2016 13:37:33 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Sep 02, 2016 at 02:15:04PM +1000, David Gibson wrote:
> > > What!?  I see no reason you need a different notifier, just fix the
> > > implementation of the current one.  As a bonus this will also give you
> > > working VFIO passthrough with vIOMMU on x86, something which should
> > > work already, but doesn't.  
> > 
> > Hi, David,
> > 
> > Do you mean that we can enhance the interface to suite the two needs?
> > E.g., adding a "IOTLB notification type" definition:
> > 
> > - "full": for those who is listening on all mapping changes including
> >   additions (VFIO use case)
> > 
> > - "cache_only": for those who only cares about cache invalidations
> >   (device IOTLB, aka, vhost use case)
> > 
> > We can:
> > 
> > - add notify type when we register the notifiers (e.g., when VFIO
> >   registers IOMMU notifier, it should specify the type as "full", so
> >   it won't receive notification if it's device IOTLB invalidations).
> > 
> > - pass this type when trigger the notification, so for each IOMMU
> >   notify handler, it can selectively disgard the notification.
> > 
> > Not sure whether above makes sense.
> 
> No, implement the full notifier, and a listener which only wants the
> invalidates can just ignore callbacks which add new mappings.
> 
> As I said, you'll need this to get VFIO working with vIOMMU which
> someone is bound to want soon enough anyway.

But for vhost cases, we do not need CM bit enabled. That might be the
difference?

I think we need to have vhost working even without CM bit. Device
IOTLB should be able to achieve that.

-- peterx
Peter Xu Sept. 2, 2016, 6:18 a.m. UTC | #7
On Fri, Sep 02, 2016 at 02:15:57PM +0800, Peter Xu wrote:
> > No, implement the full notifier, and a listener which only wants the
> > invalidates can just ignore callbacks which add new mappings.
> > 
> > As I said, you'll need this to get VFIO working with vIOMMU which
> > someone is bound to want soon enough anyway.
> 
> But for vhost cases, we do not need CM bit enabled. That might be the
> difference?
> 
> I think we need to have vhost working even without CM bit. Device
> IOTLB should be able to achieve that.

The problem is that, IMHO we should be very careful on enabling CM
bit. After enabling it, system might get slower (though I haven't
tried it yet), or even very slow? So maybe we will only enable it when
really needed (e.g., to do device passthrough and build the shadow
table).

-- peterx
David Gibson Sept. 2, 2016, 7 a.m. UTC | #8
On Fri, 2 Sep 2016 14:18:47 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 02, 2016 at 02:15:57PM +0800, Peter Xu wrote:
> > > No, implement the full notifier, and a listener which only wants the
> > > invalidates can just ignore callbacks which add new mappings.
> > > 
> > > As I said, you'll need this to get VFIO working with vIOMMU which
> > > someone is bound to want soon enough anyway.  
> > 
> > But for vhost cases, we do not need CM bit enabled. That might be the
> > difference?
> > 
> > I think we need to have vhost working even without CM bit. Device
> > IOTLB should be able to achieve that.  
> 
> The problem is that, IMHO we should be very careful on enabling CM
> bit. After enabling it, system might get slower (though I haven't
> tried it yet), or even very slow? So maybe we will only enable it when
> really needed (e.g., to do device passthrough and build the shadow
> table).

Um.. what's the CM bit and what does it have to do with anything?
Peter Xu Sept. 2, 2016, 9:31 a.m. UTC | #9
On Fri, Sep 02, 2016 at 05:00:28PM +1000, David Gibson wrote:
> On Fri, 2 Sep 2016 14:18:47 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Sep 02, 2016 at 02:15:57PM +0800, Peter Xu wrote:
> > > > No, implement the full notifier, and a listener which only wants the
> > > > invalidates can just ignore callbacks which add new mappings.
> > > > 
> > > > As I said, you'll need this to get VFIO working with vIOMMU which
> > > > someone is bound to want soon enough anyway.  
> > > 
> > > But for vhost cases, we do not need CM bit enabled. That might be the
> > > difference?
> > > 
> > > I think we need to have vhost working even without CM bit. Device
> > > IOTLB should be able to achieve that.  
> > 
> > The problem is that, IMHO we should be very careful on enabling CM
> > bit. After enabling it, system might get slower (though I haven't
> > tried it yet), or even very slow? So maybe we will only enable it when
> > really needed (e.g., to do device passthrough and build the shadow
> > table).
> 
> Um.. what's the CM bit and what does it have to do with anything?

It's used to trace guest IO address space mapping changes.

Pasted from VT-d spec chap 6.1:

    The Caching Mode (CM) field in Capability Register indicates if
    the hardware implementation caches not-present or erroneous
    translation-structure entries. When the CM field is reported as
    Set, any software updates to any remapping structures (including
    updates to not-present entries or present entries whose
    programming resulted in translation faults) requires explicit
    invalidation of the caches.

    Hardware implementations of this architecture must support
    operation corresponding to CM=0. Operation corresponding to CM=1
    may be supported by software implementations (emulation) of this
    architecture for efficient virtualization of remapping hardware.
    Software managing remapping hardware should be written to handle
    both caching modes.

    Software implementations virtualizing the remapping architecture
    (such as a VMM emulating remapping hardware to an operating system
    running within a guest partition) may report CM=1 to efficiently
    virtualize the hardware. Software virtualization typically
    requires the guest remapping structures to be shadowed in the
    host. Reporting the Caching Mode as Set for the virtual hardware
    requires the guest software to explicitly issue invalidation
    operations on the virtual hardware for any/all updates to the
    guest remapping structures. The virtualizing software may trap
    these guest invalidation operations to keep the shadow translation
    structures consistent to guest translation structure
    modifications, without resorting to other less efficient
    techniques (such as write-protecting the guest translation
    structures through the processor’s paging facility).

Currently it is not supported for Intel vIOMMUs.

-- peterx
Alex Williamson Sept. 2, 2016, 3:13 p.m. UTC | #10
On Fri, 2 Sep 2016 17:31:00 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 02, 2016 at 05:00:28PM +1000, David Gibson wrote:
> > On Fri, 2 Sep 2016 14:18:47 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Fri, Sep 02, 2016 at 02:15:57PM +0800, Peter Xu wrote:  
> > > > > No, implement the full notifier, and a listener which only wants the
> > > > > invalidates can just ignore callbacks which add new mappings.
> > > > > 
> > > > > As I said, you'll need this to get VFIO working with vIOMMU which
> > > > > someone is bound to want soon enough anyway.    
> > > > 
> > > > But for vhost cases, we do not need CM bit enabled. That might be the
> > > > difference?
> > > > 
> > > > I think we need to have vhost working even without CM bit. Device
> > > > IOTLB should be able to achieve that.    
> > > 
> > > The problem is that, IMHO we should be very careful on enabling CM
> > > bit. After enabling it, system might get slower (though I haven't
> > > tried it yet), or even very slow? So maybe we will only enable it when
> > > really needed (e.g., to do device passthrough and build the shadow
> > > table).  
> > 
> > Um.. what's the CM bit and what does it have to do with anything?  
> 
> It's used to trace guest IO address space mapping changes.
> 
> Pasted from VT-d spec chap 6.1:
> 
>     The Caching Mode (CM) field in Capability Register indicates if
>     the hardware implementation caches not-present or erroneous
>     translation-structure entries. When the CM field is reported as
>     Set, any software updates to any remapping structures (including
>     updates to not-present entries or present entries whose
>     programming resulted in translation faults) requires explicit
>     invalidation of the caches.
> 
>     Hardware implementations of this architecture must support
>     operation corresponding to CM=0. Operation corresponding to CM=1
>     may be supported by software implementations (emulation) of this
>     architecture for efficient virtualization of remapping hardware.
>     Software managing remapping hardware should be written to handle
>     both caching modes.
> 
>     Software implementations virtualizing the remapping architecture
>     (such as a VMM emulating remapping hardware to an operating system
>     running within a guest partition) may report CM=1 to efficiently
>     virtualize the hardware. Software virtualization typically
>     requires the guest remapping structures to be shadowed in the
>     host. Reporting the Caching Mode as Set for the virtual hardware
>     requires the guest software to explicitly issue invalidation
>     operations on the virtual hardware for any/all updates to the
>     guest remapping structures. The virtualizing software may trap
>     these guest invalidation operations to keep the shadow translation
>     structures consistent to guest translation structure
>     modifications, without resorting to other less efficient
>     techniques (such as write-protecting the guest translation
>     structures through the processor’s paging facility).
> 
> Currently it is not supported for Intel vIOMMUs.

Maybe memory_region_register_iommu_notifier() could take an
IOMMUAccessFlags argument (filter) that is passed to the notify_started
callback.  If a notifier client only cares about IOMMU_NONE
(invalidations), intel-iommu could allow it, regardless of the CM
setting (though I'm dubious whether this is complete in the generic
case or really only for device iotlbs).  If a client requires IOMMU_RW
then intel-iommu would currently bomb-out like it does now, or once
that gets fixed it would bomb if CM=0.  Ideally intel-iommu would
be fully functional, but somehow it was allowed into the tree
with this massive gap in support for QEMU iommu interfaces.  Thanks,

Alex
Peter Xu Sept. 5, 2016, 6:28 a.m. UTC | #11
On Fri, Sep 02, 2016 at 09:13:01AM -0600, Alex Williamson wrote:
> Maybe memory_region_register_iommu_notifier() could take an
> IOMMUAccessFlags argument (filter) that is passed to the notify_started
> callback.  If a notifier client only cares about IOMMU_NONE
> (invalidations), intel-iommu could allow it, regardless of the CM
> setting (though I'm dubious whether this is complete in the generic
> case or really only for device iotlbs).  If a client requires IOMMU_RW
> then intel-iommu would currently bomb-out like it does now, or once
> that gets fixed it would bomb if CM=0.  Ideally intel-iommu would
> be fully functional, but somehow it was allowed into the tree
> with this massive gap in support for QEMU iommu interfaces.  Thanks,

Yes, this idea should solve the issue, and looks simple.

This should be based on the assumption that we will have only one
notify register for each IOMMU memory region. However I think that
does suit our use cases (no mix use for the two types).

Meanwhile, I think we can cache this "notifier type" (or
IOMMUAccessFlags) inside that memory region, and we can further verify
the type any time we want (e.g., we can skip the notification if the
type is not matched).

I'll try to post a patch based on your suggestion, and see whether we
like it.

Thanks!

-- peterx
diff mbox

Patch

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 883db13..936c2e6 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -725,6 +725,7 @@  static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
 }

 static const MemoryRegionIOMMUOps typhoon_iommu_ops = {
+    .iommu_type = "typhoon",
     .translate = typhoon_translate_iommu,
 };

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 28c31a2..f5e3875 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2347,6 +2347,7 @@  static void vtd_init(IntelIOMMUState *s)
     memset(s->w1cmask, 0, DMAR_REG_SIZE);
     memset(s->womask, 0, DMAR_REG_SIZE);

+    s->iommu_ops.iommu_type = "intel";
     s->iommu_ops.translate = vtd_iommu_translate;
     s->iommu_ops.notify_started = vtd_iommu_notify_started;
     s->root = 0;
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 653e711..9cfbb73 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -323,6 +323,7 @@  static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion *iommu, hwaddr addr,
 }

 static MemoryRegionIOMMUOps pbm_iommu_ops = {
+    .iommu_type = "pbm",
     .translate = pbm_translate_iommu,
 };

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 6bc4d4d..e3e8739 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -244,6 +244,7 @@  static const VMStateDescription vmstate_spapr_tce_table = {
 };

 static MemoryRegionIOMMUOps spapr_iommu_ops = {
+    .iommu_type = "spapr",
     .translate = spapr_tce_translate_iommu,
     .get_min_page_size = spapr_tce_get_min_page_size,
     .notify_started = spapr_tce_notify_started,
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 9c1c04e..4414462 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -443,6 +443,7 @@  static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
 }

 static const MemoryRegionIOMMUOps s390_iommu_ops = {
+    .iommu_type = "s390",
     .translate = s390_translate_iommu,
 };

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b313e7c..317e08b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -441,6 +441,11 @@  static void vfio_listener_region_add(MemoryListener *listener,
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;

+        if (!strcmp(memory_region_iommu_type(section->mr), "intel")) {
+            error_report("Device passthrough cannot work with Intel IOMMU");
+            exit(1);
+        }
+
         trace_vfio_listener_region_add_iommu(iova, end);
         /*
          * FIXME: For VFIO iommu types which have KVM acceleration to
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3e4d416..f012f77 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -149,6 +149,8 @@  struct MemoryRegionOps {
 typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;

 struct MemoryRegionIOMMUOps {
+    /* Type of IOMMU */
+    const char *iommu_type;
     /* Return a TLB entry that contains a given address. */
     IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
     /* Returns minimum supported page size */
@@ -593,6 +595,21 @@  static inline bool memory_region_is_iommu(MemoryRegion *mr)
     return mr->iommu_ops;
 }

+/**
+ * memory_region_iommu_type: return type of IOMMU
+ *
+ * Returns type of IOMMU, empty string ("") if not a IOMMU region.
+ *
+ * @mr: the memory region being queried
+ */
+static inline const char *memory_region_iommu_type(MemoryRegion *mr)
+{
+    if (mr->iommu_ops && mr->iommu_ops->iommu_type) {
+        return mr->iommu_ops->iommu_type;
+    }
+
+    return "";
+}

 /**
  * memory_region_iommu_get_min_page_size: get minimum supported page size