diff mbox series

[v3] IOMMU/x86: make page type checks consistent when mapping pages

Message ID ac7a20e8-9e5e-f664-1d7f-d59105f90223@suse.com (mailing list archive)
State New, archived
Headers show
Series [v3] IOMMU/x86: make page type checks consistent when mapping pages | expand

Commit Message

Jan Beulich July 3, 2019, 12:18 p.m. UTC
There are currently three more or less different checks:
- _get_page_type() adjusts the IOMMU mappings according to the new type
   alone,
- arch_iommu_populate_page_table() wants just the type to be
   PGT_writable_page,
- iommu_hwdom_init() additionally permits all other types with a type
   refcount of zero.
The canonical one is in _get_page_type(), and as of XSA-288
arch_iommu_populate_page_table() also has no need anymore to deal with
PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still
necessary to consider, since in that case pages don't get handed to
guest_physmap_add_entry(). Furthermore, the function so far also
established r/o mappings, which is not in line with the rules set forth
by the XSA-288 change.

For arch_iommu_populate_page_table() to not encounter PGT_none pages
anymore even in cases where the IOMMU gets enabled for a domain only
when it is already running, replace the IOMMU dependency in
guest_physmap_add_entry()'s handling of PV guests to check just the
system wide state instead of the domain property.

Unfortunately (partially) replacing the iommu_map() call in
iommu_hwdom_init() implies resurrecting the flush suppression that got
previously eliminated. Note that the call to iommu_map() can't be
removed at this point in time - Dom0's initial allocation gets its page
types set before iommu_hwdom_init() runs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: Fix IOTLB flushing. Exclude PVH. Use type safe local variables.

Comments

Andrew Cooper July 4, 2019, 1:10 p.m. UTC | #1
On 03/07/2019 13:18, Jan Beulich wrote:
> There are currently three more or less different checks:
> - _get_page_type() adjusts the IOMMU mappings according to the new type
>    alone,
> - arch_iommu_populate_page_table() wants just the type to be
>    PGT_writable_page,
> - iommu_hwdom_init() additionally permits all other types with a type
>    refcount of zero.
> The canonical one is in _get_page_type(), and as of XSA-288
> arch_iommu_populate_page_table() also has no need anymore to deal with
> PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still
> necessary to consider, since in that case pages don't get handed to
> guest_physmap_add_entry(). Furthermore, the function so far also
> established r/o mappings, which is not in line with the rules set forth
> by the XSA-288 change.
>
> For arch_iommu_populate_page_table() to not encounter PGT_none pages
> anymore even in cases where the IOMMU gets enabled for a domain only
> when it is already running, replace the IOMMU dependency in
> guest_physmap_add_entry()'s handling of PV guests to check just the
> system wide state instead of the domain property.

And this is the problem.  We have an enormous amount of complexity, and
a hypercall which even after an XSA, we could only reduce to "will
livelock under adversarial conditions (inc. the toolstack thread which
started it)", so support a corner case which doesn't (AFACIT) look like
it can work correctly, and surely isn't used in practice.

IMO, the only sane thing to do is to have a "create an IOMMU context"
flag in domaincreate (and a shared vs split while we're at it, for the
EPT case), and have the IOMMU context properly known from the very start
of the domain.

Even if this does end up restricting a corner case which is believed to
work, I do not see it as an inconvenience or problem to require a domain
config file to specify whether it wants an IOMMU context (directly, or
indirectly via things like PCI=), and the reduction in complexity in Xen
would be massive.

How many security bugs have we already found here?  How may are still
lurking, or liable to be re-introduced because this code is just too
damn complicated for you, me and George to review sensibly?  Reducing
the complexity is the responsible thing to do.

I'm afraid that I don't view anything other than moving towards an
up-front declaration as making the situation better.

~Andrew
Jan Beulich July 4, 2019, 1:34 p.m. UTC | #2
On 04.07.2019 15:10, Andrew Cooper wrote:
> On 03/07/2019 13:18, Jan Beulich wrote:
>> There are currently three more or less different checks:
>> - _get_page_type() adjusts the IOMMU mappings according to the new type
>>     alone,
>> - arch_iommu_populate_page_table() wants just the type to be
>>     PGT_writable_page,
>> - iommu_hwdom_init() additionally permits all other types with a type
>>     refcount of zero.
>> The canonical one is in _get_page_type(), and as of XSA-288
>> arch_iommu_populate_page_table() also has no need anymore to deal with
>> PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still
>> necessary to consider, since in that case pages don't get handed to
>> guest_physmap_add_entry(). Furthermore, the function so far also
>> established r/o mappings, which is not in line with the rules set forth
>> by the XSA-288 change.
>>
>> For arch_iommu_populate_page_table() to not encounter PGT_none pages
>> anymore even in cases where the IOMMU gets enabled for a domain only
>> when it is already running, replace the IOMMU dependency in
>> guest_physmap_add_entry()'s handling of PV guests to check just the
>> system wide state instead of the domain property.
> 
> And this is the problem.  We have an enormous amount of complexity, and
> a hypercall which even after an XSA, we could only reduce to "will
> livelock under adversarial conditions (inc. the toolstack thread which
> started it)", so support a corner case which doesn't (AFACIT) look like
> it can work correctly, and surely isn't used in practice.
> 
> IMO, the only sane thing to do is to have a "create an IOMMU context"
> flag in domaincreate (and a shared vs split while we're at it, for the
> EPT case), and have the IOMMU context properly known from the very start
> of the domain.

Irrespective of me disagreeing here (which by no means suggests that
what you describe may not get done anyway), ...

> Even if this does end up restricting a corner case which is believed to
> work, I do not see it as an inconvenience or problem to require a domain
> config file to specify whether it wants an IOMMU context (directly, or
> indirectly via things like PCI=), and the reduction in complexity in Xen
> would be massive.
> 
> How many security bugs have we already found here?  How may are still
> lurking, or liable to be re-introduced because this code is just too
> damn complicated for you, me and George to review sensibly?  Reducing
> the complexity is the responsible thing to do.
> 
> I'm afraid that I don't view anything other than moving towards an
> up-front declaration as making the situation better.

... I'm pretty puzzled by this, in particular seeing that the bulk of
the change in this patch is affecting Dom0 only, i.e. is in no way
related to the point in time at which we declare a DomU to need IOMMU
page tables. The change affecting DomU-s here is a single line, plus
some comment adjustment.

Jan
Roger Pau Monné Sept. 6, 2019, 9:37 a.m. UTC | #3
On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
> There are currently three more or less different checks:
> - _get_page_type() adjusts the IOMMU mappings according to the new type
>    alone,
> - arch_iommu_populate_page_table() wants just the type to be
>    PGT_writable_page,
> - iommu_hwdom_init() additionally permits all other types with a type
>    refcount of zero.
> The canonical one is in _get_page_type(), and as of XSA-288
> arch_iommu_populate_page_table() also has no need anymore to deal with
> PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still
> necessary to consider, since in that case pages don't get handed to
> guest_physmap_add_entry(). Furthermore, the function so far also
> established r/o mappings, which is not in line with the rules set forth
> by the XSA-288 change.
> 
> For arch_iommu_populate_page_table() to not encounter PGT_none pages
> anymore even in cases where the IOMMU gets enabled for a domain only
> when it is already running, replace the IOMMU dependency in
> guest_physmap_add_entry()'s handling of PV guests to check just the
> system wide state instead of the domain property.
> 
> Unfortunately (partially) replacing the iommu_map() call in
> iommu_hwdom_init() implies resurrecting the flush suppression that got
> previously eliminated. Note that the call to iommu_map() can't be
> removed at this point in time - Dom0's initial allocation gets its page
> types set before iommu_hwdom_init() runs.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: Re-base.
> v2: Fix IOTLB flushing. Exclude PVH. Use type safe local variables.
> 
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
>            *
>            * Retain this property by grabbing a writable type ref and then
>            * dropping it immediately.  The result will be pages that have a
> -         * writable type (and an IOMMU entry), but a count of 0 (such that
> -         * any guest-requested type changes succeed and remove the IOMMU
> -         * entry).
> +         * writable type (and an IOMMU entry if necessary), but a count of 0
> +         * (such that any guest-requested type changes succeed and remove the
> +         * IOMMU entry).
>            */
>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
>           {
> -            if ( !need_iommu_pt_sync(d) )
> +            if ( !iommu_enabled )

That's kind of a strong check for a domain that might never use the
iommu at all. Isn't it fine to just rely on
arch_iommu_populate_page_table finding non-writable pages and thus not
adding them to the iommu page-tables?

>                   /* nothing */;
>               else if ( get_page_and_type(page, d, PGT_writable_page) )
>                   put_page_and_type(page);
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc
>           unsigned int i = 0, flush_flags = 0;
>           int rc = 0;
>   
> +        this_cpu(iommu_dont_flush_iotlb) = true;
> +
>           page_list_for_each ( page, &d->page_list )
>           {
> -            unsigned long mfn = mfn_x(page_to_mfn(page));
> -            unsigned long dfn = mfn_to_gmfn(d, mfn);
> -            unsigned int mapping = IOMMUF_readable;
> -            int ret;
> -
> -            if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
> -                 ((page->u.inuse.type_info & PGT_type_mask)
> -                  == PGT_writable_page) )
> -                mapping |= IOMMUF_writable;
> -
> -            ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping,
> -                            &flush_flags);
> -
> -            if ( !rc )
> -                rc = ret;
> +            switch ( page->u.inuse.type_info & PGT_type_mask )
> +            {
> +            case PGT_none:
> +                if ( is_pv_domain(d) )
> +                {
> +                    ASSERT(!(page->u.inuse.type_info & PGT_count_mask));
> +                    if ( get_page_and_type(page, d, PGT_writable_page) )

Could you add a comment that get_page_and_type will add an iommu
entry if succeeding?

> +                    {
> +                        put_page_and_type(page);
> +                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
> +                    }
> +                    else if ( !rc )
> +                        rc = -EBUSY;

Is it fine to return an error here? AFAICT you could have a RO page
shared with Xen with PGT_none, and not having an iommu mapping for it
would be expected, hence returning an error seems wrong?

Thanks, Roger.
Jan Beulich Sept. 6, 2019, 10:52 a.m. UTC | #4
On 06.09.2019 11:37, Roger Pau Monné  wrote:
> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
>>            *
>>            * Retain this property by grabbing a writable type ref and then
>>            * dropping it immediately.  The result will be pages that have a
>> -         * writable type (and an IOMMU entry), but a count of 0 (such that
>> -         * any guest-requested type changes succeed and remove the IOMMU
>> -         * entry).
>> +         * writable type (and an IOMMU entry if necessary), but a count of 0
>> +         * (such that any guest-requested type changes succeed and remove the
>> +         * IOMMU entry).
>>            */
>>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
>>           {
>> -            if ( !need_iommu_pt_sync(d) )
>> +            if ( !iommu_enabled )
> 
> That's kind of a strong check for a domain that might never use the
> iommu at all. Isn't it fine to just rely on
> arch_iommu_populate_page_table finding non-writable pages and thus not
> adding them to the iommu page-tables?

No - the code change here is to take care of page additions to
the domain after it has booted.

>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc
>>           unsigned int i = 0, flush_flags = 0;
>>           int rc = 0;
>>   
>> +        this_cpu(iommu_dont_flush_iotlb) = true;
>> +
>>           page_list_for_each ( page, &d->page_list )
>>           {
>> -            unsigned long mfn = mfn_x(page_to_mfn(page));
>> -            unsigned long dfn = mfn_to_gmfn(d, mfn);
>> -            unsigned int mapping = IOMMUF_readable;
>> -            int ret;
>> -
>> -            if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
>> -                 ((page->u.inuse.type_info & PGT_type_mask)
>> -                  == PGT_writable_page) )
>> -                mapping |= IOMMUF_writable;
>> -
>> -            ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping,
>> -                            &flush_flags);
>> -
>> -            if ( !rc )
>> -                rc = ret;
>> +            switch ( page->u.inuse.type_info & PGT_type_mask )
>> +            {
>> +            case PGT_none:
>> +                if ( is_pv_domain(d) )
>> +                {
>> +                    ASSERT(!(page->u.inuse.type_info & PGT_count_mask));
>> +                    if ( get_page_and_type(page, d, PGT_writable_page) )
> 
> Could you add a comment that get_page_and_type will add an iommu
> entry if succeeding?

Well, yes, I could, but this would just re-state what the respective
section of the big comment at the top of mm.c effectively says.

Anyway - as long as Paul's "remove late (on-demand) construction of
IOMMU page tables" would go in any time soon, this will all become
unnecessary anyway.

>> +                    {
>> +                        put_page_and_type(page);
>> +                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
>> +                    }
>> +                    else if ( !rc )
>> +                        rc = -EBUSY;
> 
> Is it fine to return an error here? AFAICT you could have a RO page
> shared with Xen with PGT_none, and not having an iommu mapping for it
> would be expected, hence returning an error seems wrong?

No, pages shared with Xen don't live on d->page_list (which is what the
loop iterates over).

Jan
Roger Pau Monné Sept. 6, 2019, 11:45 a.m. UTC | #5
On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote:
> On 06.09.2019 11:37, Roger Pau Monné  wrote:
> > On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
> >> --- a/xen/arch/x86/mm/p2m.c
> >> +++ b/xen/arch/x86/mm/p2m.c
> >> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
> >>            *
> >>            * Retain this property by grabbing a writable type ref and then
> >>            * dropping it immediately.  The result will be pages that have a
> >> -         * writable type (and an IOMMU entry), but a count of 0 (such that
> >> -         * any guest-requested type changes succeed and remove the IOMMU
> >> -         * entry).
> >> +         * writable type (and an IOMMU entry if necessary), but a count of 0
> >> +         * (such that any guest-requested type changes succeed and remove the
> >> +         * IOMMU entry).
> >>            */
> >>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
> >>           {
> >> -            if ( !need_iommu_pt_sync(d) )
> >> +            if ( !iommu_enabled )
> > 
> > That's kind of a strong check for a domain that might never use the
> > iommu at all. Isn't it fine to just rely on
> > arch_iommu_populate_page_table finding non-writable pages and thus not
> > adding them to the iommu page-tables?
> 
> No - the code change here is to take care of page additions to
> the domain after it has booted.

Please bear with me, but AFAICT arch_iommu_populate_page_table could
be used after the domain has booted if a pci device is hot plugged.

If this is to deal with additions to domains having an iommu already
enabled, isn't it enough to use need_iommu_pt_sync?

That's going to return true for all PV domains, except for dom0 not
running in strict mode, which is expected because in that case dom0
already has the whole RAM mapped into the iommu page-tables?

> 
> >> --- a/xen/drivers/passthrough/iommu.c
> >> +++ b/xen/drivers/passthrough/iommu.c
> >> @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc
> >>           unsigned int i = 0, flush_flags = 0;
> >>           int rc = 0;
> >>   
> >> +        this_cpu(iommu_dont_flush_iotlb) = true;
> >> +
> >>           page_list_for_each ( page, &d->page_list )
> >>           {
> >> -            unsigned long mfn = mfn_x(page_to_mfn(page));
> >> -            unsigned long dfn = mfn_to_gmfn(d, mfn);
> >> -            unsigned int mapping = IOMMUF_readable;
> >> -            int ret;
> >> -
> >> -            if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
> >> -                 ((page->u.inuse.type_info & PGT_type_mask)
> >> -                  == PGT_writable_page) )
> >> -                mapping |= IOMMUF_writable;
> >> -
> >> -            ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping,
> >> -                            &flush_flags);
> >> -
> >> -            if ( !rc )
> >> -                rc = ret;
> >> +            switch ( page->u.inuse.type_info & PGT_type_mask )
> >> +            {
> >> +            case PGT_none:
> >> +                if ( is_pv_domain(d) )
> >> +                {
> >> +                    ASSERT(!(page->u.inuse.type_info & PGT_count_mask));
> >> +                    if ( get_page_and_type(page, d, PGT_writable_page) )
> > 
> > Could you add a comment that get_page_and_type will add an iommu
> > entry if succeeding?
> 
> Well, yes, I could, but this would just re-state what the respective
> section of the big comment at the top of mm.c effectively says.

Oh, never mind then, it just took me some time to figure out why you
set iommu_dont_flush_iotlb without doing any _legacy iommu calls. I
then realized get_page_and_type was doing such calls.

> Anyway - as long as Paul's "remove late (on-demand) construction of
> IOMMU page tables" would go in any time soon, this will all become
> unnecessary anyway.
> 
> >> +                    {
> >> +                        put_page_and_type(page);
> >> +                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
> >> +                    }
> >> +                    else if ( !rc )
> >> +                        rc = -EBUSY;
> > 
> > Is it fine to return an error here? AFAICT you could have a RO page
> > shared with Xen with PGT_none, and not having an iommu mapping for it
> > would be expected, hence returning an error seems wrong?
> 
> No, pages shared with Xen don't live on d->page_list (which is what the
> loop iterates over).

So then there should be no PGT_none pages in d->page_list?

The only user I can find of PGT_none is share_xen_page_with_guest.

Thanks, Roger.
Jan Beulich Sept. 6, 2019, 12:08 p.m. UTC | #6
On 06.09.2019 13:45, Roger Pau Monné  wrote:
> On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote:
>> On 06.09.2019 11:37, Roger Pau Monné  wrote:
>>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/mm/p2m.c
>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
>>>>            *
>>>>            * Retain this property by grabbing a writable type ref and then
>>>>            * dropping it immediately.  The result will be pages that have a
>>>> -         * writable type (and an IOMMU entry), but a count of 0 (such that
>>>> -         * any guest-requested type changes succeed and remove the IOMMU
>>>> -         * entry).
>>>> +         * writable type (and an IOMMU entry if necessary), but a count of 0
>>>> +         * (such that any guest-requested type changes succeed and remove the
>>>> +         * IOMMU entry).
>>>>            */
>>>>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
>>>>           {
>>>> -            if ( !need_iommu_pt_sync(d) )
>>>> +            if ( !iommu_enabled )
>>>
>>> That's kind of a strong check for a domain that might never use the
>>> iommu at all. Isn't it fine to just rely on
>>> arch_iommu_populate_page_table finding non-writable pages and thus not
>>> adding them to the iommu page-tables?
>>
>> No - the code change here is to take care of page additions to
>> the domain after it has booted.
> 
> Please bear with me, but AFAICT arch_iommu_populate_page_table could
> be used after the domain has booted if a pci device is hot plugged.
> 
> If this is to deal with additions to domains having an iommu already
> enabled, isn't it enough to use need_iommu_pt_sync?
> 
> That's going to return true for all PV domains, except for dom0 not
> running in strict mode, which is expected because in that case dom0
> already has the whole RAM mapped into the iommu page-tables?

Well, my previous reply wasn't precise enough, I guess. The change
really is about both cases: If the domain is already using an IOMMU,
need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be
enabled later on, we need to transition pages out of their initial
PGT_none state right away. If we deferred this until the point
where the IOMMU gets enabled for the domain, we'd have to watch out
for PGT_none pages there, which would be extra hassle.

>>>> +                    {
>>>> +                        put_page_and_type(page);
>>>> +                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
>>>> +                    }
>>>> +                    else if ( !rc )
>>>> +                        rc = -EBUSY;
>>>
>>> Is it fine to return an error here? AFAICT you could have a RO page
>>> shared with Xen with PGT_none, and not having an iommu mapping for it
>>> would be expected, hence returning an error seems wrong?
>>
>> No, pages shared with Xen don't live on d->page_list (which is what the
>> loop iterates over).
> 
> So then there should be no PGT_none pages in d->page_list?
> 
> The only user I can find of PGT_none is share_xen_page_with_guest.

Plus the implicit use when a page gets first added to a domain (by
setting ->u.inuse.type_info to zero).

Jan
Roger Pau Monné Sept. 6, 2019, 2:08 p.m. UTC | #7
On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote:
> On 06.09.2019 13:45, Roger Pau Monné  wrote:
> > On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote:
> >> On 06.09.2019 11:37, Roger Pau Monné  wrote:
> >>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/mm/p2m.c
> >>>> +++ b/xen/arch/x86/mm/p2m.c
> >>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
> >>>>            *
> >>>>            * Retain this property by grabbing a writable type ref and then
> >>>>            * dropping it immediately.  The result will be pages that have a
> >>>> -         * writable type (and an IOMMU entry), but a count of 0 (such that
> >>>> -         * any guest-requested type changes succeed and remove the IOMMU
> >>>> -         * entry).
> >>>> +         * writable type (and an IOMMU entry if necessary), but a count of 0
> >>>> +         * (such that any guest-requested type changes succeed and remove the
> >>>> +         * IOMMU entry).
> >>>>            */
> >>>>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
> >>>>           {
> >>>> -            if ( !need_iommu_pt_sync(d) )
> >>>> +            if ( !iommu_enabled )
> >>>
> >>> That's kind of a strong check for a domain that might never use the
> >>> iommu at all. Isn't it fine to just rely on
> >>> arch_iommu_populate_page_table finding non-writable pages and thus not
> >>> adding them to the iommu page-tables?
> >>
> >> No - the code change here is to take care of page additions to
> >> the domain after it has booted.
> > 
> > Please bear with me, but AFAICT arch_iommu_populate_page_table could
> > be used after the domain has booted if a pci device is hot plugged.
> > 
> > If this is to deal with additions to domains having an iommu already
> > enabled, isn't it enough to use need_iommu_pt_sync?
> > 
> > That's going to return true for all PV domains, except for dom0 not
> > running in strict mode, which is expected because in that case dom0
> > already has the whole RAM mapped into the iommu page-tables?
> 
> Well, my previous reply wasn't precise enough, I guess. The change
> really is about both cases: If the domain is already using an IOMMU,
> need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be
> enabled later on, we need to transition pages out of their initial
> PGT_none state right away. If we deferred this until the point
> where the IOMMU gets enabled for the domain, we'd have to watch out
> for PGT_none pages there, which would be extra hassle.

I still think a relaxed PV dom0 should avoid the overhead of
get_page_and_type, and the iommu flush done afterwards, as it already
has all host RAM into it's iommu page-tables.

Ie: I think the check should be something like:

if ( !iommu_enabled ||
     (is_hardware_domain(d) && !need_iommu_pt_sync(d) )

> >>>> +                    {
> >>>> +                        put_page_and_type(page);
> >>>> +                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
> >>>> +                    }
> >>>> +                    else if ( !rc )
> >>>> +                        rc = -EBUSY;
> >>>
> >>> Is it fine to return an error here? AFAICT you could have a RO page
> >>> shared with Xen with PGT_none, and not having an iommu mapping for it
> >>> would be expected, hence returning an error seems wrong?
> >>
> >> No, pages shared with Xen don't live on d->page_list (which is what the
> >> loop iterates over).
> > 
> > So then there should be no PGT_none pages in d->page_list?
> > 
> > The only user I can find of PGT_none is share_xen_page_with_guest.
> 
> Plus the implicit use when a page gets first added to a domain (by
> setting ->u.inuse.type_info to zero).

Ack, thanks for the clarification.

Roger.
Jan Beulich Sept. 6, 2019, 2:19 p.m. UTC | #8
On 06.09.2019 16:08, Roger Pau Monné  wrote:
> On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote:
>> On 06.09.2019 13:45, Roger Pau Monné  wrote:
>>> On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote:
>>>> On 06.09.2019 11:37, Roger Pau Monné  wrote:
>>>>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
>>>>>> --- a/xen/arch/x86/mm/p2m.c
>>>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
>>>>>>            *
>>>>>>            * Retain this property by grabbing a writable type ref and then
>>>>>>            * dropping it immediately.  The result will be pages that have a
>>>>>> -         * writable type (and an IOMMU entry), but a count of 0 (such that
>>>>>> -         * any guest-requested type changes succeed and remove the IOMMU
>>>>>> -         * entry).
>>>>>> +         * writable type (and an IOMMU entry if necessary), but a count of 0
>>>>>> +         * (such that any guest-requested type changes succeed and remove the
>>>>>> +         * IOMMU entry).
>>>>>>            */
>>>>>>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
>>>>>>           {
>>>>>> -            if ( !need_iommu_pt_sync(d) )
>>>>>> +            if ( !iommu_enabled )
>>>>>
>>>>> That's kind of a strong check for a domain that might never use the
>>>>> iommu at all. Isn't it fine to just rely on
>>>>> arch_iommu_populate_page_table finding non-writable pages and thus not
>>>>> adding them to the iommu page-tables?
>>>>
>>>> No - the code change here is to take care of page additions to
>>>> the domain after it has booted.
>>>
>>> Please bear with me, but AFAICT arch_iommu_populate_page_table could
>>> be used after the domain has booted if a pci device is hot plugged.
>>>
>>> If this is to deal with additions to domains having an iommu already
>>> enabled, isn't it enough to use need_iommu_pt_sync?
>>>
>>> That's going to return true for all PV domains, except for dom0 not
>>> running in strict mode, which is expected because in that case dom0
>>> already has the whole RAM mapped into the iommu page-tables?
>>
>> Well, my previous reply wasn't precise enough, I guess. The change
>> really is about both cases: If the domain is already using an IOMMU,
>> need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be
>> enabled later on, we need to transition pages out of their initial
>> PGT_none state right away. If we deferred this until the point
>> where the IOMMU gets enabled for the domain, we'd have to watch out
>> for PGT_none pages there, which would be extra hassle.
> 
> I still think a relaxed PV dom0 should avoid the overhead of
> get_page_and_type, and the iommu flush done afterwards, as it already
> has all host RAM into it's iommu page-tables.
> 
> Ie: I think the check should be something like:
> 
> if ( !iommu_enabled ||
>      (is_hardware_domain(d) && !need_iommu_pt_sync(d) )

Ah, yes, I can certainly do this (if the patch doesn't become
unnecessary anyway).

Jan
Roger Pau Monné Sept. 6, 2019, 2:32 p.m. UTC | #9
On Fri, Sep 06, 2019 at 04:19:50PM +0200, Jan Beulich wrote:
> On 06.09.2019 16:08, Roger Pau Monné  wrote:
> > On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote:
> >> On 06.09.2019 13:45, Roger Pau Monné  wrote:
> >>> On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote:
> >>>> On 06.09.2019 11:37, Roger Pau Monné  wrote:
> >>>>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote:
> >>>>>> --- a/xen/arch/x86/mm/p2m.c
> >>>>>> +++ b/xen/arch/x86/mm/p2m.c
> >>>>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d,
> >>>>>>            *
> >>>>>>            * Retain this property by grabbing a writable type ref and then
> >>>>>>            * dropping it immediately.  The result will be pages that have a
> >>>>>> -         * writable type (and an IOMMU entry), but a count of 0 (such that
> >>>>>> -         * any guest-requested type changes succeed and remove the IOMMU
> >>>>>> -         * entry).
> >>>>>> +         * writable type (and an IOMMU entry if necessary), but a count of 0
> >>>>>> +         * (such that any guest-requested type changes succeed and remove the
> >>>>>> +         * IOMMU entry).
> >>>>>>            */
> >>>>>>           for ( i = 0; i < (1UL << page_order); ++i, ++page )
> >>>>>>           {
> >>>>>> -            if ( !need_iommu_pt_sync(d) )
> >>>>>> +            if ( !iommu_enabled )
> >>>>>
> >>>>> That's kind of a strong check for a domain that might never use the
> >>>>> iommu at all. Isn't it fine to just rely on
> >>>>> arch_iommu_populate_page_table finding non-writable pages and thus not
> >>>>> adding them to the iommu page-tables?
> >>>>
> >>>> No - the code change here is to take care of page additions to
> >>>> the domain after it has booted.
> >>>
> >>> Please bear with me, but AFAICT arch_iommu_populate_page_table could
> >>> be used after the domain has booted if a pci device is hot plugged.
> >>>
> >>> If this is to deal with additions to domains having an iommu already
> >>> enabled, isn't it enough to use need_iommu_pt_sync?
> >>>
> >>> That's going to return true for all PV domains, except for dom0 not
> >>> running in strict mode, which is expected because in that case dom0
> >>> already has the whole RAM mapped into the iommu page-tables?
> >>
> >> Well, my previous reply wasn't precise enough, I guess. The change
> >> really is about both cases: If the domain is already using an IOMMU,
> >> need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be
> >> enabled later on, we need to transition pages out of their initial
> >> PGT_none state right away. If we deferred this until the point
> >> where the IOMMU gets enabled for the domain, we'd have to watch out
> >> for PGT_none pages there, which would be extra hassle.
> > 
> > I still think a relaxed PV dom0 should avoid the overhead of
> > get_page_and_type, and the iommu flush done afterwards, as it already
> > has all host RAM into it's iommu page-tables.
> > 
> > Ie: I think the check should be something like:
> > 
> > if ( !iommu_enabled ||
> >      (is_hardware_domain(d) && !need_iommu_pt_sync(d) )
> 
> Ah, yes, I can certainly do this (if the patch doesn't become
> unnecessary anyway).

With that please add my Reviewed-by: Roger Pau Monné
<roger.pau@citrix.com>, albeit as you say I'm not sure whether this is
really needed in light of the upcoming iommu changes.

Thanks, Roger.
diff mbox series

Patch

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -829,13 +829,13 @@  guest_physmap_add_page(struct domain *d,
           *
           * Retain this property by grabbing a writable type ref and then
           * dropping it immediately.  The result will be pages that have a
-         * writable type (and an IOMMU entry), but a count of 0 (such that
-         * any guest-requested type changes succeed and remove the IOMMU
-         * entry).
+         * writable type (and an IOMMU entry if necessary), but a count of 0
+         * (such that any guest-requested type changes succeed and remove the
+         * IOMMU entry).
           */
          for ( i = 0; i < (1UL << page_order); ++i, ++page )
          {
-            if ( !need_iommu_pt_sync(d) )
+            if ( !iommu_enabled )
                  /* nothing */;
              else if ( get_page_and_type(page, d, PGT_writable_page) )
                  put_page_and_type(page);
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -192,28 +192,46 @@  void __hwdom_init iommu_hwdom_init(struc
          unsigned int i = 0, flush_flags = 0;
          int rc = 0;
  
+        this_cpu(iommu_dont_flush_iotlb) = true;
+
          page_list_for_each ( page, &d->page_list )
          {
-            unsigned long mfn = mfn_x(page_to_mfn(page));
-            unsigned long dfn = mfn_to_gmfn(d, mfn);
-            unsigned int mapping = IOMMUF_readable;
-            int ret;
-
-            if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
-                 ((page->u.inuse.type_info & PGT_type_mask)
-                  == PGT_writable_page) )
-                mapping |= IOMMUF_writable;
-
-            ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping,
-                            &flush_flags);
-
-            if ( !rc )
-                rc = ret;
+            switch ( page->u.inuse.type_info & PGT_type_mask )
+            {
+            case PGT_none:
+                if ( is_pv_domain(d) )
+                {
+                    ASSERT(!(page->u.inuse.type_info & PGT_count_mask));
+                    if ( get_page_and_type(page, d, PGT_writable_page) )
+                    {
+                        put_page_and_type(page);
+                        flush_flags |= IOMMUF_readable | IOMMUF_writable;
+                    }
+                    else if ( !rc )
+                        rc = -EBUSY;
+                    break;
+                }
+                /* fall through */
+            case PGT_writable_page:
+            {
+                mfn_t mfn = page_to_mfn(page);
+                dfn_t dfn = _dfn(mfn_to_gmfn(d, mfn_x(mfn)));
+                int ret = iommu_map(d, dfn, mfn, 0,
+                                    IOMMUF_readable | IOMMUF_writable,
+                                    &flush_flags);
+
+                if ( !rc )
+                    rc = ret;
+                break;
+            }
+            }
  
              if ( !(i++ & 0xfffff) )
                  process_pending_softirqs();
          }
  
+        this_cpu(iommu_dont_flush_iotlb) = false;
+
          /* Use while-break to avoid compiler warning */
          while ( iommu_iotlb_flush_all(d, flush_flags) )
              break;