Message ID | ac7a20e8-9e5e-f664-1d7f-d59105f90223@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] IOMMU/x86: make page type checks consistent when mapping pages | expand |
On 03/07/2019 13:18, Jan Beulich wrote: > There are currently three more or less different checks: > - _get_page_type() adjusts the IOMMU mappings according to the new type > alone, > - arch_iommu_populate_page_table() wants just the type to be > PGT_writable_page, > - iommu_hwdom_init() additionally permits all other types with a type > refcount of zero. > The canonical one is in _get_page_type(), and as of XSA-288 > arch_iommu_populate_page_table() also has no need anymore to deal with > PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still > necessary to consider, since in that case pages don't get handed to > guest_physmap_add_entry(). Furthermore, the function so far also > established r/o mappings, which is not in line with the rules set forth > by the XSA-288 change. > > For arch_iommu_populate_page_table() to not encounter PGT_none pages > anymore even in cases where the IOMMU gets enabled for a domain only > when it is already running, replace the IOMMU dependency in > guest_physmap_add_entry()'s handling of PV guests to check just the > system wide state instead of the domain property. And this is the problem. We have an enormous amount of complexity, and a hypercall which even after an XSA, we could only reduce to "will livelock under adversarial conditions (inc. the toolstack thread which started it)", so support a corner case which doesn't (AFACIT) look like it can work correctly, and surely isn't used in practice. IMO, the only sane thing to do is to have a "create an IOMMU context" flag in domaincreate (and a shared vs split while we're at it, for the EPT case), and have the IOMMU context properly known from the very start of the domain. Even if this does end up restricting a corner case which is believed to work, I do not see it as an inconvenience or problem to require a domain config file to specify whether it wants an IOMMU context (directly, or indirectly via things like PCI=), and the reduction in complexity in Xen would be massive. How many security bugs have we already found here? How may are still lurking, or liable to be re-introduced because this code is just too damn complicated for you, me and George to review sensibly? Reducing the complexity is the responsible thing to do. I'm afraid that I don't view anything other than moving towards an up-front declaration as making the situation better. ~Andrew
On 04.07.2019 15:10, Andrew Cooper wrote: > On 03/07/2019 13:18, Jan Beulich wrote: >> There are currently three more or less different checks: >> - _get_page_type() adjusts the IOMMU mappings according to the new type >> alone, >> - arch_iommu_populate_page_table() wants just the type to be >> PGT_writable_page, >> - iommu_hwdom_init() additionally permits all other types with a type >> refcount of zero. >> The canonical one is in _get_page_type(), and as of XSA-288 >> arch_iommu_populate_page_table() also has no need anymore to deal with >> PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still >> necessary to consider, since in that case pages don't get handed to >> guest_physmap_add_entry(). Furthermore, the function so far also >> established r/o mappings, which is not in line with the rules set forth >> by the XSA-288 change. >> >> For arch_iommu_populate_page_table() to not encounter PGT_none pages >> anymore even in cases where the IOMMU gets enabled for a domain only >> when it is already running, replace the IOMMU dependency in >> guest_physmap_add_entry()'s handling of PV guests to check just the >> system wide state instead of the domain property. > > And this is the problem. We have an enormous amount of complexity, and > a hypercall which even after an XSA, we could only reduce to "will > livelock under adversarial conditions (inc. the toolstack thread which > started it)", so support a corner case which doesn't (AFACIT) look like > it can work correctly, and surely isn't used in practice. > > IMO, the only sane thing to do is to have a "create an IOMMU context" > flag in domaincreate (and a shared vs split while we're at it, for the > EPT case), and have the IOMMU context properly known from the very start > of the domain. Irrespective of me disagreeing here (which by no means suggests that what you describe may not get done anyway), ... > Even if this does end up restricting a corner case which is believed to > work, I do not see it as an inconvenience or problem to require a domain > config file to specify whether it wants an IOMMU context (directly, or > indirectly via things like PCI=), and the reduction in complexity in Xen > would be massive. > > How many security bugs have we already found here? How may are still > lurking, or liable to be re-introduced because this code is just too > damn complicated for you, me and George to review sensibly? Reducing > the complexity is the responsible thing to do. > > I'm afraid that I don't view anything other than moving towards an > up-front declaration as making the situation better. ... I'm pretty puzzled by this, in particular seeing that the bulk of the change in this patch is affecting Dom0 only, i.e. is in no way related to the point in time at which we declare a DomU to need IOMMU page tables. The change affecting DomU-s here is a single line, plus some comment adjustment. Jan
On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: > There are currently three more or less different checks: > - _get_page_type() adjusts the IOMMU mappings according to the new type > alone, > - arch_iommu_populate_page_table() wants just the type to be > PGT_writable_page, > - iommu_hwdom_init() additionally permits all other types with a type > refcount of zero. > The canonical one is in _get_page_type(), and as of XSA-288 > arch_iommu_populate_page_table() also has no need anymore to deal with > PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still > necessary to consider, since in that case pages don't get handed to > guest_physmap_add_entry(). Furthermore, the function so far also > established r/o mappings, which is not in line with the rules set forth > by the XSA-288 change. > > For arch_iommu_populate_page_table() to not encounter PGT_none pages > anymore even in cases where the IOMMU gets enabled for a domain only > when it is already running, replace the IOMMU dependency in > guest_physmap_add_entry()'s handling of PV guests to check just the > system wide state instead of the domain property. > > Unfortunately (partially) replacing the iommu_map() call in > iommu_hwdom_init() implies resurrecting the flush suppression that got > previously eliminated. Note that the call to iommu_map() can't be > removed at this point in time - Dom0's initial allocation gets its page > types set before iommu_hwdom_init() runs. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > v3: Re-base. > v2: Fix IOTLB flushing. Exclude PVH. Use type safe local variables. > > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, > * > * Retain this property by grabbing a writable type ref and then > * dropping it immediately. The result will be pages that have a > - * writable type (and an IOMMU entry), but a count of 0 (such that > - * any guest-requested type changes succeed and remove the IOMMU > - * entry). > + * writable type (and an IOMMU entry if necessary), but a count of 0 > + * (such that any guest-requested type changes succeed and remove the > + * IOMMU entry). > */ > for ( i = 0; i < (1UL << page_order); ++i, ++page ) > { > - if ( !need_iommu_pt_sync(d) ) > + if ( !iommu_enabled ) That's kind of a strong check for a domain that might never use the iommu at all. Isn't it fine to just rely on arch_iommu_populate_page_table finding non-writable pages and thus not adding them to the iommu page-tables? > /* nothing */; > else if ( get_page_and_type(page, d, PGT_writable_page) ) > put_page_and_type(page); > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc > unsigned int i = 0, flush_flags = 0; > int rc = 0; > > + this_cpu(iommu_dont_flush_iotlb) = true; > + > page_list_for_each ( page, &d->page_list ) > { > - unsigned long mfn = mfn_x(page_to_mfn(page)); > - unsigned long dfn = mfn_to_gmfn(d, mfn); > - unsigned int mapping = IOMMUF_readable; > - int ret; > - > - if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || > - ((page->u.inuse.type_info & PGT_type_mask) > - == PGT_writable_page) ) > - mapping |= IOMMUF_writable; > - > - ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping, > - &flush_flags); > - > - if ( !rc ) > - rc = ret; > + switch ( page->u.inuse.type_info & PGT_type_mask ) > + { > + case PGT_none: > + if ( is_pv_domain(d) ) > + { > + ASSERT(!(page->u.inuse.type_info & PGT_count_mask)); > + if ( get_page_and_type(page, d, PGT_writable_page) ) Could you add a comment that get_page_and_type will add an iommu entry if succeeding? > + { > + put_page_and_type(page); > + flush_flags |= IOMMUF_readable | IOMMUF_writable; > + } > + else if ( !rc ) > + rc = -EBUSY; Is it fine to return an error here? AFAICT you could have a RO page shared with Xen with PGT_none, and not having an iommu mapping for it would be expected, hence returning an error seems wrong? Thanks, Roger.
On 06.09.2019 11:37, Roger Pau Monné wrote: > On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, >> * >> * Retain this property by grabbing a writable type ref and then >> * dropping it immediately. The result will be pages that have a >> - * writable type (and an IOMMU entry), but a count of 0 (such that >> - * any guest-requested type changes succeed and remove the IOMMU >> - * entry). >> + * writable type (and an IOMMU entry if necessary), but a count of 0 >> + * (such that any guest-requested type changes succeed and remove the >> + * IOMMU entry). >> */ >> for ( i = 0; i < (1UL << page_order); ++i, ++page ) >> { >> - if ( !need_iommu_pt_sync(d) ) >> + if ( !iommu_enabled ) > > That's kind of a strong check for a domain that might never use the > iommu at all. Isn't it fine to just rely on > arch_iommu_populate_page_table finding non-writable pages and thus not > adding them to the iommu page-tables? No - the code change here is to take care of page additions to the domain after it has booted. >> --- a/xen/drivers/passthrough/iommu.c >> +++ b/xen/drivers/passthrough/iommu.c >> @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc >> unsigned int i = 0, flush_flags = 0; >> int rc = 0; >> >> + this_cpu(iommu_dont_flush_iotlb) = true; >> + >> page_list_for_each ( page, &d->page_list ) >> { >> - unsigned long mfn = mfn_x(page_to_mfn(page)); >> - unsigned long dfn = mfn_to_gmfn(d, mfn); >> - unsigned int mapping = IOMMUF_readable; >> - int ret; >> - >> - if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || >> - ((page->u.inuse.type_info & PGT_type_mask) >> - == PGT_writable_page) ) >> - mapping |= IOMMUF_writable; >> - >> - ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping, >> - &flush_flags); >> - >> - if ( !rc ) >> - rc = ret; >> + switch ( page->u.inuse.type_info & PGT_type_mask ) >> + { >> + case PGT_none: >> + if ( is_pv_domain(d) ) >> + { >> + ASSERT(!(page->u.inuse.type_info & PGT_count_mask)); >> + if ( get_page_and_type(page, d, PGT_writable_page) ) > > Could you add a comment that get_page_and_type will add an iommu > entry if succeeding? Well, yes, I could, but this would just re-state what the respective section of the big comment at the top of mm.c effectively says. Anyway - as long as Paul's "remove late (on-demand) construction of IOMMU page tables" would go in any time soon, this will all become unnecessary anyway. >> + { >> + put_page_and_type(page); >> + flush_flags |= IOMMUF_readable | IOMMUF_writable; >> + } >> + else if ( !rc ) >> + rc = -EBUSY; > > Is it fine to return an error here? AFAICT you could have a RO page > shared with Xen with PGT_none, and not having an iommu mapping for it > would be expected, hence returning an error seems wrong? No, pages shared with Xen don't live on d->page_list (which is what the loop iterates over). Jan
On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote: > On 06.09.2019 11:37, Roger Pau Monné wrote: > > On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: > >> --- a/xen/arch/x86/mm/p2m.c > >> +++ b/xen/arch/x86/mm/p2m.c > >> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, > >> * > >> * Retain this property by grabbing a writable type ref and then > >> * dropping it immediately. The result will be pages that have a > >> - * writable type (and an IOMMU entry), but a count of 0 (such that > >> - * any guest-requested type changes succeed and remove the IOMMU > >> - * entry). > >> + * writable type (and an IOMMU entry if necessary), but a count of 0 > >> + * (such that any guest-requested type changes succeed and remove the > >> + * IOMMU entry). > >> */ > >> for ( i = 0; i < (1UL << page_order); ++i, ++page ) > >> { > >> - if ( !need_iommu_pt_sync(d) ) > >> + if ( !iommu_enabled ) > > > > That's kind of a strong check for a domain that might never use the > > iommu at all. Isn't it fine to just rely on > > arch_iommu_populate_page_table finding non-writable pages and thus not > > adding them to the iommu page-tables? > > No - the code change here is to take care of page additions to > the domain after it has booted. Please bear with me, but AFAICT arch_iommu_populate_page_table could be used after the domain has booted if a pci device is hot plugged. If this is to deal with additions to domains having an iommu already enabled, isn't it enough to use need_iommu_pt_sync? That's going to return true for all PV domains, except for dom0 not running in strict mode, which is expected because in that case dom0 already has the whole RAM mapped into the iommu page-tables? > > >> --- a/xen/drivers/passthrough/iommu.c > >> +++ b/xen/drivers/passthrough/iommu.c > >> @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc > >> unsigned int i = 0, flush_flags = 0; > >> int rc = 0; > >> > >> + this_cpu(iommu_dont_flush_iotlb) = true; > >> + > >> page_list_for_each ( page, &d->page_list ) > >> { > >> - unsigned long mfn = mfn_x(page_to_mfn(page)); > >> - unsigned long dfn = mfn_to_gmfn(d, mfn); > >> - unsigned int mapping = IOMMUF_readable; > >> - int ret; > >> - > >> - if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || > >> - ((page->u.inuse.type_info & PGT_type_mask) > >> - == PGT_writable_page) ) > >> - mapping |= IOMMUF_writable; > >> - > >> - ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping, > >> - &flush_flags); > >> - > >> - if ( !rc ) > >> - rc = ret; > >> + switch ( page->u.inuse.type_info & PGT_type_mask ) > >> + { > >> + case PGT_none: > >> + if ( is_pv_domain(d) ) > >> + { > >> + ASSERT(!(page->u.inuse.type_info & PGT_count_mask)); > >> + if ( get_page_and_type(page, d, PGT_writable_page) ) > > > > Could you add a comment that get_page_and_type will add an iommu > > entry if succeeding? > > Well, yes, I could, but this would just re-state what the respective > section of the big comment at the top of mm.c effectively says. Oh, never mind then, it just took me some time to figure out why you set iommu_dont_flush_iotlb without doing any _legacy iommu calls. I then realized get_page_and_type was doing such calls. > Anyway - as long as Paul's "remove late (on-demand) construction of > IOMMU page tables" would go in any time soon, this will all become > unnecessary anyway. > > >> + { > >> + put_page_and_type(page); > >> + flush_flags |= IOMMUF_readable | IOMMUF_writable; > >> + } > >> + else if ( !rc ) > >> + rc = -EBUSY; > > > > Is it fine to return an error here? AFAICT you could have a RO page > > shared with Xen with PGT_none, and not having an iommu mapping for it > > would be expected, hence returning an error seems wrong? > > No, pages shared with Xen don't live on d->page_list (which is what the > loop iterates over). So then there should be no PGT_none pages in d->page_list? The only user I can find of PGT_none is share_xen_page_with_guest. Thanks, Roger.
On 06.09.2019 13:45, Roger Pau Monné wrote: > On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote: >> On 06.09.2019 11:37, Roger Pau Monné wrote: >>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: >>>> --- a/xen/arch/x86/mm/p2m.c >>>> +++ b/xen/arch/x86/mm/p2m.c >>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, >>>> * >>>> * Retain this property by grabbing a writable type ref and then >>>> * dropping it immediately. The result will be pages that have a >>>> - * writable type (and an IOMMU entry), but a count of 0 (such that >>>> - * any guest-requested type changes succeed and remove the IOMMU >>>> - * entry). >>>> + * writable type (and an IOMMU entry if necessary), but a count of 0 >>>> + * (such that any guest-requested type changes succeed and remove the >>>> + * IOMMU entry). >>>> */ >>>> for ( i = 0; i < (1UL << page_order); ++i, ++page ) >>>> { >>>> - if ( !need_iommu_pt_sync(d) ) >>>> + if ( !iommu_enabled ) >>> >>> That's kind of a strong check for a domain that might never use the >>> iommu at all. Isn't it fine to just rely on >>> arch_iommu_populate_page_table finding non-writable pages and thus not >>> adding them to the iommu page-tables? >> >> No - the code change here is to take care of page additions to >> the domain after it has booted. > > Please bear with me, but AFAICT arch_iommu_populate_page_table could > be used after the domain has booted if a pci device is hot plugged. > > If this is to deal with additions to domains having an iommu already > enabled, isn't it enough to use need_iommu_pt_sync? > > That's going to return true for all PV domains, except for dom0 not > running in strict mode, which is expected because in that case dom0 > already has the whole RAM mapped into the iommu page-tables? Well, my previous reply wasn't precise enough, I guess. The change really is about both cases: If the domain is already using an IOMMU, need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be enabled later on, we need to transition pages out of their initial PGT_none state right away. If we deferred this until the point where the IOMMU gets enabled for the domain, we'd have to watch out for PGT_none pages there, which would be extra hassle. >>>> + { >>>> + put_page_and_type(page); >>>> + flush_flags |= IOMMUF_readable | IOMMUF_writable; >>>> + } >>>> + else if ( !rc ) >>>> + rc = -EBUSY; >>> >>> Is it fine to return an error here? AFAICT you could have a RO page >>> shared with Xen with PGT_none, and not having an iommu mapping for it >>> would be expected, hence returning an error seems wrong? >> >> No, pages shared with Xen don't live on d->page_list (which is what the >> loop iterates over). > > So then there should be no PGT_none pages in d->page_list? > > The only user I can find of PGT_none is share_xen_page_with_guest. Plus the implicit use when a page gets first added to a domain (by setting ->u.inuse.type_info to zero). Jan
On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote: > On 06.09.2019 13:45, Roger Pau Monné wrote: > > On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote: > >> On 06.09.2019 11:37, Roger Pau Monné wrote: > >>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: > >>>> --- a/xen/arch/x86/mm/p2m.c > >>>> +++ b/xen/arch/x86/mm/p2m.c > >>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, > >>>> * > >>>> * Retain this property by grabbing a writable type ref and then > >>>> * dropping it immediately. The result will be pages that have a > >>>> - * writable type (and an IOMMU entry), but a count of 0 (such that > >>>> - * any guest-requested type changes succeed and remove the IOMMU > >>>> - * entry). > >>>> + * writable type (and an IOMMU entry if necessary), but a count of 0 > >>>> + * (such that any guest-requested type changes succeed and remove the > >>>> + * IOMMU entry). > >>>> */ > >>>> for ( i = 0; i < (1UL << page_order); ++i, ++page ) > >>>> { > >>>> - if ( !need_iommu_pt_sync(d) ) > >>>> + if ( !iommu_enabled ) > >>> > >>> That's kind of a strong check for a domain that might never use the > >>> iommu at all. Isn't it fine to just rely on > >>> arch_iommu_populate_page_table finding non-writable pages and thus not > >>> adding them to the iommu page-tables? > >> > >> No - the code change here is to take care of page additions to > >> the domain after it has booted. > > > > Please bear with me, but AFAICT arch_iommu_populate_page_table could > > be used after the domain has booted if a pci device is hot plugged. > > > > If this is to deal with additions to domains having an iommu already > > enabled, isn't it enough to use need_iommu_pt_sync? > > > > That's going to return true for all PV domains, except for dom0 not > > running in strict mode, which is expected because in that case dom0 > > already has the whole RAM mapped into the iommu page-tables? > > Well, my previous reply wasn't precise enough, I guess. The change > really is about both cases: If the domain is already using an IOMMU, > need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be > enabled later on, we need to transition pages out of their initial > PGT_none state right away. If we deferred this until the point > where the IOMMU gets enabled for the domain, we'd have to watch out > for PGT_none pages there, which would be extra hassle. I still think a relaxed PV dom0 should avoid the overhead of get_page_and_type, and the iommu flush done afterwards, as it already has all host RAM into it's iommu page-tables. Ie: I think the check should be something like: if ( !iommu_enabled || (is_hardware_domain(d) && !need_iommu_pt_sync(d) ) > >>>> + { > >>>> + put_page_and_type(page); > >>>> + flush_flags |= IOMMUF_readable | IOMMUF_writable; > >>>> + } > >>>> + else if ( !rc ) > >>>> + rc = -EBUSY; > >>> > >>> Is it fine to return an error here? AFAICT you could have a RO page > >>> shared with Xen with PGT_none, and not having an iommu mapping for it > >>> would be expected, hence returning an error seems wrong? > >> > >> No, pages shared with Xen don't live on d->page_list (which is what the > >> loop iterates over). > > > > So then there should be no PGT_none pages in d->page_list? > > > > The only user I can find of PGT_none is share_xen_page_with_guest. > > Plus the implicit use when a page gets first added to a domain (by > setting ->u.inuse.type_info to zero). Ack, thanks for the clarification. Roger.
On 06.09.2019 16:08, Roger Pau Monné wrote: > On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote: >> On 06.09.2019 13:45, Roger Pau Monné wrote: >>> On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote: >>>> On 06.09.2019 11:37, Roger Pau Monné wrote: >>>>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: >>>>>> --- a/xen/arch/x86/mm/p2m.c >>>>>> +++ b/xen/arch/x86/mm/p2m.c >>>>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, >>>>>> * >>>>>> * Retain this property by grabbing a writable type ref and then >>>>>> * dropping it immediately. The result will be pages that have a >>>>>> - * writable type (and an IOMMU entry), but a count of 0 (such that >>>>>> - * any guest-requested type changes succeed and remove the IOMMU >>>>>> - * entry). >>>>>> + * writable type (and an IOMMU entry if necessary), but a count of 0 >>>>>> + * (such that any guest-requested type changes succeed and remove the >>>>>> + * IOMMU entry). >>>>>> */ >>>>>> for ( i = 0; i < (1UL << page_order); ++i, ++page ) >>>>>> { >>>>>> - if ( !need_iommu_pt_sync(d) ) >>>>>> + if ( !iommu_enabled ) >>>>> >>>>> That's kind of a strong check for a domain that might never use the >>>>> iommu at all. Isn't it fine to just rely on >>>>> arch_iommu_populate_page_table finding non-writable pages and thus not >>>>> adding them to the iommu page-tables? >>>> >>>> No - the code change here is to take care of page additions to >>>> the domain after it has booted. >>> >>> Please bear with me, but AFAICT arch_iommu_populate_page_table could >>> be used after the domain has booted if a pci device is hot plugged. >>> >>> If this is to deal with additions to domains having an iommu already >>> enabled, isn't it enough to use need_iommu_pt_sync? >>> >>> That's going to return true for all PV domains, except for dom0 not >>> running in strict mode, which is expected because in that case dom0 >>> already has the whole RAM mapped into the iommu page-tables? >> >> Well, my previous reply wasn't precise enough, I guess. The change >> really is about both cases: If the domain is already using an IOMMU, >> need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be >> enabled later on, we need to transition pages out of their initial >> PGT_none state right away. If we deferred this until the point >> where the IOMMU gets enabled for the domain, we'd have to watch out >> for PGT_none pages there, which would be extra hassle. > > I still think a relaxed PV dom0 should avoid the overhead of > get_page_and_type, and the iommu flush done afterwards, as it already > has all host RAM into it's iommu page-tables. > > Ie: I think the check should be something like: > > if ( !iommu_enabled || > (is_hardware_domain(d) && !need_iommu_pt_sync(d) ) Ah, yes, I can certainly do this (if the patch doesn't become unnecessary anyway). Jan
On Fri, Sep 06, 2019 at 04:19:50PM +0200, Jan Beulich wrote: > On 06.09.2019 16:08, Roger Pau Monné wrote: > > On Fri, Sep 06, 2019 at 02:08:09PM +0200, Jan Beulich wrote: > >> On 06.09.2019 13:45, Roger Pau Monné wrote: > >>> On Fri, Sep 06, 2019 at 12:52:11PM +0200, Jan Beulich wrote: > >>>> On 06.09.2019 11:37, Roger Pau Monné wrote: > >>>>> On Wed, Jul 03, 2019 at 12:18:45PM +0000, Jan Beulich wrote: > >>>>>> --- a/xen/arch/x86/mm/p2m.c > >>>>>> +++ b/xen/arch/x86/mm/p2m.c > >>>>>> @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, > >>>>>> * > >>>>>> * Retain this property by grabbing a writable type ref and then > >>>>>> * dropping it immediately. The result will be pages that have a > >>>>>> - * writable type (and an IOMMU entry), but a count of 0 (such that > >>>>>> - * any guest-requested type changes succeed and remove the IOMMU > >>>>>> - * entry). > >>>>>> + * writable type (and an IOMMU entry if necessary), but a count of 0 > >>>>>> + * (such that any guest-requested type changes succeed and remove the > >>>>>> + * IOMMU entry). > >>>>>> */ > >>>>>> for ( i = 0; i < (1UL << page_order); ++i, ++page ) > >>>>>> { > >>>>>> - if ( !need_iommu_pt_sync(d) ) > >>>>>> + if ( !iommu_enabled ) > >>>>> > >>>>> That's kind of a strong check for a domain that might never use the > >>>>> iommu at all. Isn't it fine to just rely on > >>>>> arch_iommu_populate_page_table finding non-writable pages and thus not > >>>>> adding them to the iommu page-tables? > >>>> > >>>> No - the code change here is to take care of page additions to > >>>> the domain after it has booted. > >>> > >>> Please bear with me, but AFAICT arch_iommu_populate_page_table could > >>> be used after the domain has booted if a pci device is hot plugged. > >>> > >>> If this is to deal with additions to domains having an iommu already > >>> enabled, isn't it enough to use need_iommu_pt_sync? > >>> > >>> That's going to return true for all PV domains, except for dom0 not > >>> running in strict mode, which is expected because in that case dom0 > >>> already has the whole RAM mapped into the iommu page-tables? > >> > >> Well, my previous reply wasn't precise enough, I guess. The change > >> really is about both cases: If the domain is already using an IOMMU, > >> need_iommu_pt_sync() would be enough indeed. If IOMMU use _may_ be > >> enabled later on, we need to transition pages out of their initial > >> PGT_none state right away. If we deferred this until the point > >> where the IOMMU gets enabled for the domain, we'd have to watch out > >> for PGT_none pages there, which would be extra hassle. > > > > I still think a relaxed PV dom0 should avoid the overhead of > > get_page_and_type, and the iommu flush done afterwards, as it already > > has all host RAM into it's iommu page-tables. > > > > Ie: I think the check should be something like: > > > > if ( !iommu_enabled || > > (is_hardware_domain(d) && !need_iommu_pt_sync(d) ) > > Ah, yes, I can certainly do this (if the patch doesn't become > unnecessary anyway). With that please add my Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>, albeit as you say I'm not sure whether this is really needed in light of the upcoming iommu changes. Thanks, Roger.
--- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -829,13 +829,13 @@ guest_physmap_add_page(struct domain *d, * * Retain this property by grabbing a writable type ref and then * dropping it immediately. The result will be pages that have a - * writable type (and an IOMMU entry), but a count of 0 (such that - * any guest-requested type changes succeed and remove the IOMMU - * entry). + * writable type (and an IOMMU entry if necessary), but a count of 0 + * (such that any guest-requested type changes succeed and remove the + * IOMMU entry). */ for ( i = 0; i < (1UL << page_order); ++i, ++page ) { - if ( !need_iommu_pt_sync(d) ) + if ( !iommu_enabled ) /* nothing */; else if ( get_page_and_type(page, d, PGT_writable_page) ) put_page_and_type(page); --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -192,28 +192,46 @@ void __hwdom_init iommu_hwdom_init(struc unsigned int i = 0, flush_flags = 0; int rc = 0; + this_cpu(iommu_dont_flush_iotlb) = true; + page_list_for_each ( page, &d->page_list ) { - unsigned long mfn = mfn_x(page_to_mfn(page)); - unsigned long dfn = mfn_to_gmfn(d, mfn); - unsigned int mapping = IOMMUF_readable; - int ret; - - if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || - ((page->u.inuse.type_info & PGT_type_mask) - == PGT_writable_page) ) - mapping |= IOMMUF_writable; - - ret = iommu_map(d, _dfn(dfn), _mfn(mfn), 0, mapping, - &flush_flags); - - if ( !rc ) - rc = ret; + switch ( page->u.inuse.type_info & PGT_type_mask ) + { + case PGT_none: + if ( is_pv_domain(d) ) + { + ASSERT(!(page->u.inuse.type_info & PGT_count_mask)); + if ( get_page_and_type(page, d, PGT_writable_page) ) + { + put_page_and_type(page); + flush_flags |= IOMMUF_readable | IOMMUF_writable; + } + else if ( !rc ) + rc = -EBUSY; + break; + } + /* fall through */ + case PGT_writable_page: + { + mfn_t mfn = page_to_mfn(page); + dfn_t dfn = _dfn(mfn_to_gmfn(d, mfn_x(mfn))); + int ret = iommu_map(d, dfn, mfn, 0, + IOMMUF_readable | IOMMUF_writable, + &flush_flags); + + if ( !rc ) + rc = ret; + break; + } + } if ( !(i++ & 0xfffff) ) process_pending_softirqs(); } + this_cpu(iommu_dont_flush_iotlb) = false; + /* Use while-break to avoid compiler warning */ while ( iommu_iotlb_flush_all(d, flush_flags) ) break;
There are currently three more or less different checks: - _get_page_type() adjusts the IOMMU mappings according to the new type alone, - arch_iommu_populate_page_table() wants just the type to be PGT_writable_page, - iommu_hwdom_init() additionally permits all other types with a type refcount of zero. The canonical one is in _get_page_type(), and as of XSA-288 arch_iommu_populate_page_table() also has no need anymore to deal with PGT_none pages. In the PV Dom0 case, however, PGT_none pages are still necessary to consider, since in that case pages don't get handed to guest_physmap_add_entry(). Furthermore, the function so far also established r/o mappings, which is not in line with the rules set forth by the XSA-288 change. For arch_iommu_populate_page_table() to not encounter PGT_none pages anymore even in cases where the IOMMU gets enabled for a domain only when it is already running, replace the IOMMU dependency in guest_physmap_add_entry()'s handling of PV guests to check just the system wide state instead of the domain property. Unfortunately (partially) replacing the iommu_map() call in iommu_hwdom_init() implies resurrecting the flush suppression that got previously eliminated. Note that the call to iommu_map() can't be removed at this point in time - Dom0's initial allocation gets its page types set before iommu_hwdom_init() runs. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v3: Re-base. v2: Fix IOTLB flushing. Exclude PVH. Use type safe local variables.