diff mbox

[2/3] memory: add iommu_notify_flag

Message ID dcb70f94-76da-da93-8c6a-38b1e40fa479@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Sept. 5, 2016, 9:56 a.m. UTC
On 05/09/2016 10:38, Peter Xu wrote:
> However in this patch I was not meant to do that. I made it an
> exclusive flag to identify two different use cases. I don't know
> whether this is good, but at least for Intel IOMMU's current use case,
> these two types should be totally isolated from each other:
> 
> - IOMMU_NONE notification is used by future DMAR-enabled vhost, it
>   should only be notified with device-IOTLB invalidations, this will
>   only require "Device IOTLB" capability for Intel IOMMUs, and be
>   notified in Device IOTLB invalidation handlers.
> 
> - IOMMU_RW notifications should only be used for vfio-pci, notified
>   with IOTLB invalidations. This will only require Cache Mode (CM=1)
>   capability, and will be notified in common IOTLB invalidations (no
>   matter whether it's an cache invalidation or not, we will all use
>   IOMMU_RW flag for this kind of notifies).
> 
> Maybe here naming the flags as IOMMU_{RW_NONE} is a little bit
> confusing (just to leverage existing access flags),

Yeah, if you really want to have these semantics, you need to define an 
enum like this:

	IOMMU_NOTIFIER_NONE = -1,
	IOMMU_NOTIFIER_FLUSH = 0,
	IOMMU_NOTIFIER_CHANGED_ENTRY = 1,

But I'm still not convinced of the exclusivity between "flush" and 
"entry changed" notifiers.  If I saw the above, my first reaction would 
be that you need a bit mask:

	IOMMU_NOTIFIER_NONE = -1,
	IOMMU_NOTIFIER_FLUSH = 1,
	IOMMU_NOTIFIER_CHANGED_ENTRY = 2,

But perhaps what you're looking for is to change the "notifier" to a 
"listener" like

	struct IOMMUListener {
	    void (*flush)(IOMMUListener *);
	    void (*entry_changed)(IOMMUListener *, IOMMUTLBEntry *);
	    QLIST_ENTRY(IOMMUListener) node;
	};

The patches can start with an IOMMUListener that only has the 
entry_changed callback and that replaces the current use of Notifier.  
Then notify_started and notify_stopped can be called on every notifier 
that is added/removed (see attached prototype), and the Intel IOMMU can 
simply reject registration of a listener that has a non-NULL 
iotlb_changed member.

Thanks,

Paolo

Comments

Peter Xu Sept. 6, 2016, 5:27 a.m. UTC | #1
On Mon, Sep 05, 2016 at 11:56:12AM +0200, Paolo Bonzini wrote:
> Yeah, if you really want to have these semantics, you need to define an 
> enum like this:
> 
> 	IOMMU_NOTIFIER_NONE = -1,
> 	IOMMU_NOTIFIER_FLUSH = 0,
> 	IOMMU_NOTIFIER_CHANGED_ENTRY = 1,
> 
> But I'm still not convinced of the exclusivity between "flush" and 
> "entry changed" notifiers.  If I saw the above, my first reaction would 
> be that you need a bit mask:
> 
> 	IOMMU_NOTIFIER_NONE = -1,
> 	IOMMU_NOTIFIER_FLUSH = 1,
> 	IOMMU_NOTIFIER_CHANGED_ENTRY = 2,
> 
> But perhaps what you're looking for is to change the "notifier" to a 
> "listener" like
> 
> 	struct IOMMUListener {
> 	    void (*flush)(IOMMUListener *);
> 	    void (*entry_changed)(IOMMUListener *, IOMMUTLBEntry *);
> 	    QLIST_ENTRY(IOMMUListener) node;
> 	};
> 
> The patches can start with an IOMMUListener that only has the 
> entry_changed callback and that replaces the current use of Notifier.  
> Then notify_started and notify_stopped can be called on every notifier 
> that is added/removed (see attached prototype), and the Intel IOMMU can 
> simply reject registration of a listener that has a non-NULL 
> iotlb_changed member.

Thanks for the quick prototyping. :-)

Maybe I haven't explained the idea very clearly, but device-IOTLB is
not a "flush" of whole device cache. It still needs a IOMMUTLBEntry,
and works just like how general IOMMU invalidations. E.g., we can do
device-IOTLB invalidation for a single 4K page.

However, I agree with you that the namings are confusing, maybe at
least we should introduce IOMMU_NOTIFIER_* macros, though instead of a
_FLUSH one, we can have:

    IOMMU_NOTIFIER_NONE = -1,
    IOMMU_NOTIFIER_DEVICE_INVALIDATE = 0,
    IOMMU_NOTIFIER_IOTLB_CHANGED = 1,

To clarify that these are two non-overlapped cases.

Thanks,

-- peterx
Paolo Bonzini Sept. 6, 2016, 7:51 a.m. UTC | #2
On 06/09/2016 07:27, Peter Xu wrote:
> Maybe I haven't explained the idea very clearly, but device-IOTLB is
> not a "flush" of whole device cache. It still needs a IOMMUTLBEntry,
> and works just like how general IOMMU invalidations. E.g., we can do
> device-IOTLB invalidation for a single 4K page.

Yes, it can be FLUSHED_ENTRY and CHANGED_ENTRY or
INVALIDATE_ENTRY/CHANGE_ENTRY.

> However, I agree with you that the namings are confusing, maybe at
> least we should introduce IOMMU_NOTIFIER_* macros, though instead of a
> _FLUSH one, we can have:
> 
>     IOMMU_NOTIFIER_NONE = -1,
>     IOMMU_NOTIFIER_DEVICE_INVALIDATE = 0,
>     IOMMU_NOTIFIER_IOTLB_CHANGED = 1,

I suggest making the names more similar:

- two participles (invalidated/changed) or two imperatives
(invalidate!/change!);

- choose whether to keep the verb first ("invalidate device") or keep
the noun first ("IOTLB changed"), and stick with one convention.

> To clarify that these are two non-overlapped cases.

If they are not overlapping, they really should be using a bitmask or
multiple callbacks in a struct...

Paolo
Peter Xu Sept. 6, 2016, 8:17 a.m. UTC | #3
On Tue, Sep 06, 2016 at 09:51:28AM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/09/2016 07:27, Peter Xu wrote:
> > Maybe I haven't explained the idea very clearly, but device-IOTLB is
> > not a "flush" of whole device cache. It still needs a IOMMUTLBEntry,
> > and works just like how general IOMMU invalidations. E.g., we can do
> > device-IOTLB invalidation for a single 4K page.
> 
> Yes, it can be FLUSHED_ENTRY and CHANGED_ENTRY or
> INVALIDATE_ENTRY/CHANGE_ENTRY.
> 
> > However, I agree with you that the namings are confusing, maybe at
> > least we should introduce IOMMU_NOTIFIER_* macros, though instead of a
> > _FLUSH one, we can have:
> > 
> >     IOMMU_NOTIFIER_NONE = -1,
> >     IOMMU_NOTIFIER_DEVICE_INVALIDATE = 0,
> >     IOMMU_NOTIFIER_IOTLB_CHANGED = 1,
> 
> I suggest making the names more similar:
> 
> - two participles (invalidated/changed) or two imperatives
> (invalidate!/change!);
> 
> - choose whether to keep the verb first ("invalidate device") or keep
> the noun first ("IOTLB changed"), and stick with one convention.

Sensible. Will follow.

> 
> > To clarify that these are two non-overlapped cases.
> 
> If they are not overlapping, they really should be using a bitmask or
> multiple callbacks in a struct...

After knowing the possibility that the two consumers might be
mixturely used in the future (as David has mentioned), I'd vote for a
bitmask for notification type:

    IOMMU_NOTIFIER_NONE = 0,
    IOMMU_NOTIFIER_INVALIDATION = 1,
    IOMMU_NOTIFIER_ADDITION = 2,

When registering the IOMMU notifier, user should specify the type. For
VFIO, it should be (INVALIDATION|ADDITION). For vhost, (INVALIDATION)
would suffice.

Will cook a v2 soon. Thanks!

-- peterx
Paolo Bonzini Sept. 6, 2016, 8:19 a.m. UTC | #4
On 06/09/2016 10:17, Peter Xu wrote:
> After knowing the possibility that the two consumers might be
> mixturely used in the future (as David has mentioned), I'd vote for a
> bitmask for notification type:
> 
>     IOMMU_NOTIFIER_NONE = 0,
>     IOMMU_NOTIFIER_INVALIDATION = 1,
>     IOMMU_NOTIFIER_ADDITION = 2,

ADDITION really should be "CHANGE" I think, so what about
IOMMU_NOTIFIER_INVALIDATE and IOMMU_NOTIFIER_CHANGE?

For VFIO, would the "invalidate" and "add" callbacks use the same code
or different?

Thanks,

Paolo
Peter Xu Sept. 6, 2016, 10:31 a.m. UTC | #5
On Tue, Sep 06, 2016 at 10:19:14AM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/09/2016 10:17, Peter Xu wrote:
> > After knowing the possibility that the two consumers might be
> > mixturely used in the future (as David has mentioned), I'd vote for a
> > bitmask for notification type:
> > 
> >     IOMMU_NOTIFIER_NONE = 0,
> >     IOMMU_NOTIFIER_INVALIDATION = 1,
> >     IOMMU_NOTIFIER_ADDITION = 2,
> 
> ADDITION really should be "CHANGE" I think, so what about
> IOMMU_NOTIFIER_INVALIDATE and IOMMU_NOTIFIER_CHANGE?

For "CHANGE", it sounds like a unmap() + a map(). However I'd say
"ADDITION" is nowhere better...

Will use "CHANGE".

> 
> For VFIO, would the "invalidate" and "add" callbacks use the same code
> or different?

Currently vfio_iommu_map_notify() should be handling both.

Thanks,

-- peterx
David Gibson Sept. 7, 2016, 5:44 a.m. UTC | #6
On Tue, Sep 06, 2016 at 06:31:42PM +0800, Peter Xu wrote:
> On Tue, Sep 06, 2016 at 10:19:14AM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 06/09/2016 10:17, Peter Xu wrote:
> > > After knowing the possibility that the two consumers might be
> > > mixturely used in the future (as David has mentioned), I'd vote for a
> > > bitmask for notification type:
> > > 
> > >     IOMMU_NOTIFIER_NONE = 0,
> > >     IOMMU_NOTIFIER_INVALIDATION = 1,
> > >     IOMMU_NOTIFIER_ADDITION = 2,
> > 
> > ADDITION really should be "CHANGE" I think, so what about
> > IOMMU_NOTIFIER_INVALIDATE and IOMMU_NOTIFIER_CHANGE?
> 
> For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> "ADDITION" is nowhere better...

Right.. this brings up a good point.

Changing a mapping (i.e. overwriting an existing mapping with a
different one) would also need notification, even on x86, no?  Since
it implicitly invalidates the previous mapping.

I'm guessing the guest will avoid this by always unmapping before it
maps.  We still need to consider this possibility when designing the
notifier interface though.

It seems the real notification triggers here are:
    * map - something is mapped which previously wasn't
    * unmap - something is no longer mapped which was before

Note that whether the second needs to be triggered depends on the
*previous* state of that IOBA range, *not* on the permissions of the
new mapping (if any).

A "change" - replacing one mapping with another should count as both a
"map" and "unmap" event.

> 
> Will use "CHANGE".
> 
> > 
> > For VFIO, would the "invalidate" and "add" callbacks use the same code
> > or different?
> 
> Currently vfio_iommu_map_notify() should be handling both.
> 
> Thanks,
> 
> -- peterx
>
Peter Xu Sept. 7, 2016, 6:34 a.m. UTC | #7
On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > "ADDITION" is nowhere better...
> 
> Right.. this brings up a good point.
> 
> Changing a mapping (i.e. overwriting an existing mapping with a
> different one) would also need notification, even on x86, no?  Since
> it implicitly invalidates the previous mapping.
> 
> I'm guessing the guest will avoid this by always unmapping before it
> maps.  We still need to consider this possibility when designing the
> notifier interface though.
> 
> It seems the real notification triggers here are:
>     * map - something is mapped which previously wasn't
>     * unmap - something is no longer mapped which was before
> 
> Note that whether the second needs to be triggered depends on the
> *previous* state of that IOBA range, *not* on the permissions of the
> new mapping (if any).
> 
> A "change" - replacing one mapping with another should count as both a
> "map" and "unmap" event.

Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
it doesn't care about map/unmap, it cares about invalidated cache. So
IIUC this is a question about "naming" but not the implementations...
I suppose it is really a matter of taste, and both work for me (either
INVALIDATION/CHANGE or UNMAP/MAP).

Thanks,

-- peterx
David Gibson Sept. 7, 2016, 6:41 a.m. UTC | #8
On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > "ADDITION" is nowhere better...
> > 
> > Right.. this brings up a good point.
> > 
> > Changing a mapping (i.e. overwriting an existing mapping with a
> > different one) would also need notification, even on x86, no?  Since
> > it implicitly invalidates the previous mapping.
> > 
> > I'm guessing the guest will avoid this by always unmapping before it
> > maps.  We still need to consider this possibility when designing the
> > notifier interface though.
> > 
> > It seems the real notification triggers here are:
> >     * map - something is mapped which previously wasn't
> >     * unmap - something is no longer mapped which was before
> > 
> > Note that whether the second needs to be triggered depends on the
> > *previous* state of that IOBA range, *not* on the permissions of the
> > new mapping (if any).
> > 
> > A "change" - replacing one mapping with another should count as both a
> > "map" and "unmap" event.
> 
> Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> it doesn't care about map/unmap, it cares about invalidated cache.

I think caring about invalidated cache *is* caring about unmap.  It
doesn't matter whether the new mapping is something or nothing - if
the old mapping is no longer valid, you need to invalidate the cache,
yes?

> So
> IIUC this is a question about "naming" but not the implementations...
> I suppose it is really a matter of taste, and both work for me (either
> INVALIDATION/CHANGE or UNMAP/MAP).

No.. it is a question of implementation.  My point is that I don't
think the new permission is sufficient information to let you know if
a notification is necessary.  You need to know if there was an
existing mapping at that IOBA.
Peter Xu Sept. 8, 2016, 9:07 a.m. UTC | #9
On Wed, Sep 07, 2016 at 04:41:54PM +1000, David Gibson wrote:
> On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> > On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > > "ADDITION" is nowhere better...
> > > 
> > > Right.. this brings up a good point.
> > > 
> > > Changing a mapping (i.e. overwriting an existing mapping with a
> > > different one) would also need notification, even on x86, no?  Since
> > > it implicitly invalidates the previous mapping.
> > > 
> > > I'm guessing the guest will avoid this by always unmapping before it
> > > maps.  We still need to consider this possibility when designing the
> > > notifier interface though.
> > > 
> > > It seems the real notification triggers here are:
> > >     * map - something is mapped which previously wasn't
> > >     * unmap - something is no longer mapped which was before
> > > 
> > > Note that whether the second needs to be triggered depends on the
> > > *previous* state of that IOBA range, *not* on the permissions of the
> > > new mapping (if any).
> > > 
> > > A "change" - replacing one mapping with another should count as both a
> > > "map" and "unmap" event.
> > 
> > Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> > it doesn't care about map/unmap, it cares about invalidated cache.
> 
> I think caring about invalidated cache *is* caring about unmap.  It
> doesn't matter whether the new mapping is something or nothing - if
> the old mapping is no longer valid, you need to invalidate the cache,
> yes?

Yes, I think these two are exactly the same in implementation (vhost
needs UNMAP events of course). So that's why I called it "a naming
issue". :)

> 
> > So
> > IIUC this is a question about "naming" but not the implementations...
> > I suppose it is really a matter of taste, and both work for me (either
> > INVALIDATION/CHANGE or UNMAP/MAP).
> 
> No.. it is a question of implementation.  My point is that I don't
> think the new permission is sufficient information to let you know if
> a notification is necessary.  You need to know if there was an
> existing mapping at that IOBA.

My understanding is that we don't need to know that. Because IIUC
there are only map_page() and unmap_page() in guest IOMMU driver
(please check dma_map_ops in kernel). There is no chance for anyone to
"change" the content of the mapping, unless it calls unmap_page() then
with a map_page(). In that case, we'll have two IOTLB invalidation
requests.

Please kindly correct me if I am wrong.

Thanks,

-- peterx
David Gibson Sept. 12, 2016, 1:26 a.m. UTC | #10
On Thu, Sep 08, 2016 at 05:07:32PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 04:41:54PM +1000, David Gibson wrote:
> > On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> > > On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > > > "ADDITION" is nowhere better...
> > > > 
> > > > Right.. this brings up a good point.
> > > > 
> > > > Changing a mapping (i.e. overwriting an existing mapping with a
> > > > different one) would also need notification, even on x86, no?  Since
> > > > it implicitly invalidates the previous mapping.
> > > > 
> > > > I'm guessing the guest will avoid this by always unmapping before it
> > > > maps.  We still need to consider this possibility when designing the
> > > > notifier interface though.
> > > > 
> > > > It seems the real notification triggers here are:
> > > >     * map - something is mapped which previously wasn't
> > > >     * unmap - something is no longer mapped which was before
> > > > 
> > > > Note that whether the second needs to be triggered depends on the
> > > > *previous* state of that IOBA range, *not* on the permissions of the
> > > > new mapping (if any).
> > > > 
> > > > A "change" - replacing one mapping with another should count as both a
> > > > "map" and "unmap" event.
> > > 
> > > Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> > > it doesn't care about map/unmap, it cares about invalidated cache.
> > 
> > I think caring about invalidated cache *is* caring about unmap.  It
> > doesn't matter whether the new mapping is something or nothing - if
> > the old mapping is no longer valid, you need to invalidate the cache,
> > yes?
> 
> Yes, I think these two are exactly the same in implementation (vhost
> needs UNMAP events of course). So that's why I called it "a naming
> issue". :)
> 
> > 
> > > So
> > > IIUC this is a question about "naming" but not the implementations...
> > > I suppose it is really a matter of taste, and both work for me (either
> > > INVALIDATION/CHANGE or UNMAP/MAP).
> > 
> > No.. it is a question of implementation.  My point is that I don't
> > think the new permission is sufficient information to let you know if
> > a notification is necessary.  You need to know if there was an
> > existing mapping at that IOBA.
> 
> My understanding is that we don't need to know that. Because IIUC
> there are only map_page() and unmap_page() in guest IOMMU driver
> (please check dma_map_ops in kernel). There is no chance for anyone to
> "change" the content of the mapping, unless it calls unmap_page() then
> with a map_page(). In that case, we'll have two IOTLB invalidation
> requests.

That's assuming a Linux guest using the current guest IOMMU model.

I don't think we do so in practice, but the PAPR hypercall interface
allows in-place changing of a mapping.  The interface is just "set
this IOPTE to this value".

> 
> Please kindly correct me if I am wrong.
> 
> Thanks,
> 
> -- peterx
>
Peter Xu Sept. 12, 2016, 5:13 a.m. UTC | #11
On Mon, Sep 12, 2016 at 11:26:04AM +1000, David Gibson wrote:
> On Thu, Sep 08, 2016 at 05:07:32PM +0800, Peter Xu wrote:
> > On Wed, Sep 07, 2016 at 04:41:54PM +1000, David Gibson wrote:
> > > On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> > > > On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > > > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > > > > "ADDITION" is nowhere better...
> > > > > 
> > > > > Right.. this brings up a good point.
> > > > > 
> > > > > Changing a mapping (i.e. overwriting an existing mapping with a
> > > > > different one) would also need notification, even on x86, no?  Since
> > > > > it implicitly invalidates the previous mapping.
> > > > > 
> > > > > I'm guessing the guest will avoid this by always unmapping before it
> > > > > maps.  We still need to consider this possibility when designing the
> > > > > notifier interface though.
> > > > > 
> > > > > It seems the real notification triggers here are:
> > > > >     * map - something is mapped which previously wasn't
> > > > >     * unmap - something is no longer mapped which was before
> > > > > 
> > > > > Note that whether the second needs to be triggered depends on the
> > > > > *previous* state of that IOBA range, *not* on the permissions of the
> > > > > new mapping (if any).
> > > > > 
> > > > > A "change" - replacing one mapping with another should count as both a
> > > > > "map" and "unmap" event.
> > > > 
> > > > Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> > > > it doesn't care about map/unmap, it cares about invalidated cache.
> > > 
> > > I think caring about invalidated cache *is* caring about unmap.  It
> > > doesn't matter whether the new mapping is something or nothing - if
> > > the old mapping is no longer valid, you need to invalidate the cache,
> > > yes?
> > 
> > Yes, I think these two are exactly the same in implementation (vhost
> > needs UNMAP events of course). So that's why I called it "a naming
> > issue". :)
> > 
> > > 
> > > > So
> > > > IIUC this is a question about "naming" but not the implementations...
> > > > I suppose it is really a matter of taste, and both work for me (either
> > > > INVALIDATION/CHANGE or UNMAP/MAP).
> > > 
> > > No.. it is a question of implementation.  My point is that I don't
> > > think the new permission is sufficient information to let you know if
> > > a notification is necessary.  You need to know if there was an
> > > existing mapping at that IOBA.
> > 
> > My understanding is that we don't need to know that. Because IIUC
> > there are only map_page() and unmap_page() in guest IOMMU driver
> > (please check dma_map_ops in kernel). There is no chance for anyone to
> > "change" the content of the mapping, unless it calls unmap_page() then
> > with a map_page(). In that case, we'll have two IOTLB invalidation
> > requests.
> 
> That's assuming a Linux guest using the current guest IOMMU model.
> 
> I don't think we do so in practice, but the PAPR hypercall interface
> allows in-place changing of a mapping.  The interface is just "set
> this IOPTE to this value".

I see. Even if so, QEMU IOMMU emulation codes can convert one CHANGE
request into UNMAP and a continuous MAP, right?

Thanks,

-- peterx
David Gibson Sept. 14, 2016, 4 a.m. UTC | #12
On Mon, Sep 12, 2016 at 01:13:41PM +0800, Peter Xu wrote:
> On Mon, Sep 12, 2016 at 11:26:04AM +1000, David Gibson wrote:
> > On Thu, Sep 08, 2016 at 05:07:32PM +0800, Peter Xu wrote:
> > > On Wed, Sep 07, 2016 at 04:41:54PM +1000, David Gibson wrote:
> > > > On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> > > > > On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > > > > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > > > > > "ADDITION" is nowhere better...
> > > > > > 
> > > > > > Right.. this brings up a good point.
> > > > > > 
> > > > > > Changing a mapping (i.e. overwriting an existing mapping with a
> > > > > > different one) would also need notification, even on x86, no?  Since
> > > > > > it implicitly invalidates the previous mapping.
> > > > > > 
> > > > > > I'm guessing the guest will avoid this by always unmapping before it
> > > > > > maps.  We still need to consider this possibility when designing the
> > > > > > notifier interface though.
> > > > > > 
> > > > > > It seems the real notification triggers here are:
> > > > > >     * map - something is mapped which previously wasn't
> > > > > >     * unmap - something is no longer mapped which was before
> > > > > > 
> > > > > > Note that whether the second needs to be triggered depends on the
> > > > > > *previous* state of that IOBA range, *not* on the permissions of the
> > > > > > new mapping (if any).
> > > > > > 
> > > > > > A "change" - replacing one mapping with another should count as both a
> > > > > > "map" and "unmap" event.
> > > > > 
> > > > > Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> > > > > it doesn't care about map/unmap, it cares about invalidated cache.
> > > > 
> > > > I think caring about invalidated cache *is* caring about unmap.  It
> > > > doesn't matter whether the new mapping is something or nothing - if
> > > > the old mapping is no longer valid, you need to invalidate the cache,
> > > > yes?
> > > 
> > > Yes, I think these two are exactly the same in implementation (vhost
> > > needs UNMAP events of course). So that's why I called it "a naming
> > > issue". :)
> > > 
> > > > 
> > > > > So
> > > > > IIUC this is a question about "naming" but not the implementations...
> > > > > I suppose it is really a matter of taste, and both work for me (either
> > > > > INVALIDATION/CHANGE or UNMAP/MAP).
> > > > 
> > > > No.. it is a question of implementation.  My point is that I don't
> > > > think the new permission is sufficient information to let you know if
> > > > a notification is necessary.  You need to know if there was an
> > > > existing mapping at that IOBA.
> > > 
> > > My understanding is that we don't need to know that. Because IIUC
> > > there are only map_page() and unmap_page() in guest IOMMU driver
> > > (please check dma_map_ops in kernel). There is no chance for anyone to
> > > "change" the content of the mapping, unless it calls unmap_page() then
> > > with a map_page(). In that case, we'll have two IOTLB invalidation
> > > requests.
> > 
> > That's assuming a Linux guest using the current guest IOMMU model.
> > 
> > I don't think we do so in practice, but the PAPR hypercall interface
> > allows in-place changing of a mapping.  The interface is just "set
> > this IOPTE to this value".
> 
> I see. Even if so, QEMU IOMMU emulation codes can convert one CHANGE
> request into UNMAP and a continuous MAP, right?

Yes, I guess so.  Why is that preferable to issuing a single
notification to both "map" and "unmap" listeners though?
Peter Xu Sept. 14, 2016, 5:43 a.m. UTC | #13
On Wed, Sep 14, 2016 at 02:00:29PM +1000, David Gibson wrote:

[...]

> > > > > 
> > > > > > So
> > > > > > IIUC this is a question about "naming" but not the implementations...
> > > > > > I suppose it is really a matter of taste, and both work for me (either
> > > > > > INVALIDATION/CHANGE or UNMAP/MAP).
> > > > > 
> > > > > No.. it is a question of implementation.  My point is that I don't
> > > > > think the new permission is sufficient information to let you know if
> > > > > a notification is necessary.  You need to know if there was an
> > > > > existing mapping at that IOBA.
> > > > 
> > > > My understanding is that we don't need to know that. Because IIUC
> > > > there are only map_page() and unmap_page() in guest IOMMU driver
> > > > (please check dma_map_ops in kernel). There is no chance for anyone to
> > > > "change" the content of the mapping, unless it calls unmap_page() then
> > > > with a map_page(). In that case, we'll have two IOTLB invalidation
> > > > requests.
> > > 
> > > That's assuming a Linux guest using the current guest IOMMU model.
> > > 
> > > I don't think we do so in practice, but the PAPR hypercall interface
> > > allows in-place changing of a mapping.  The interface is just "set
> > > this IOPTE to this value".
> > 
> > I see. Even if so, QEMU IOMMU emulation codes can convert one CHANGE
> > request into UNMAP and a continuous MAP, right?
> 
> Yes, I guess so.  Why is that preferable to issuing a single
> notification to both "map" and "unmap" listeners though?

So I think we should be talking about the same thing here (please
correct me if I am wrong...). Please review v4 of this series and see
whether that works (I renamed CHANGE into MAP, so there will be
MAP/UNMAP, and I think things are clearer with it).

Thanks!

-- peterx
diff mbox

Patch

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 6bc4d4d..d8cbd90 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -156,14 +156,20 @@  static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
     return 1ULL << tcet->page_shift;
 }
 
-static void spapr_tce_notify_started(MemoryRegion *iommu)
+static void spapr_tce_listener_add(MemoryRegion *iommu, IOMMUListener *l)
 {
-    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
+    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+    if (tcet->users++ == 0) {
+        spapr_tce_set_need_vfio(tcet, true);
+    }
 }
 
-static void spapr_tce_notify_stopped(MemoryRegion *iommu)
+static void spapr_tce_listener_del(MemoryRegion *iommu, IOMMUListener *l)
 {
-    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false);
+    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+    if (--tcet->users == 0) {
+        spapr_tce_set_need_vfio(tcet, false);
+    }
 }
 
 static int spapr_tce_table_post_load(void *opaque, int version_id)
@@ -246,8 +252,8 @@  static const VMStateDescription vmstate_spapr_tce_table = {
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
     .translate = spapr_tce_translate_iommu,
     .get_min_page_size = spapr_tce_get_min_page_size,
-    .notify_started = spapr_tce_notify_started,
-    .notify_stopped = spapr_tce_notify_stopped,
+    .listener_add = spapr_tce_listener_add,
+    .listener_del = spapr_tce_listener_del,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index caf7be9..4761afd 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -558,6 +558,7 @@  struct sPAPRTCETable {
     uint64_t *table;
     uint32_t mig_nb_table;
     uint64_t *mig_table;
+    int users;
     bool bypass;
     bool need_vfio;
     int fd;
diff --git a/memory.c b/memory.c
index 0eb6895..e9f50af 100644
--- a/memory.c
+++ b/memory.c
@@ -1515,9 +1515,8 @@  bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
 
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
 {
-    if (mr->iommu_ops->notify_started &&
-        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
-        mr->iommu_ops->notify_started(mr);
+    if (mr->iommu_ops->listener_add) {
+        mr->iommu_ops->listener_add(mr, ...);
     }
     notifier_list_add(&mr->iommu_notify, n);
 }
@@ -1555,9 +1554,8 @@  void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write)
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
 {
     notifier_remove(n);
-    if (mr->iommu_ops->notify_stopped &&
-        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
-        mr->iommu_ops->notify_stopped(mr);
+    if (mr->iommu_ops->listener_del) {
+        mr->iommu_ops->listener_del(mr, ...);
     }
 }