Message ID | 1458197676-60696-2-git-send-email-quan.xu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> From: Xu, Quan > Sent: Thursday, March 17, 2016 2:55 PM > > diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c > b/xen/drivers/passthrough/vtd/x86/vtd.c > index c0d6aab..e5ab10a 100644 > --- a/xen/drivers/passthrough/vtd/x86/vtd.c > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d) > > tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K); > for ( j = 0; j < tmp; j++ ) > - iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, > - IOMMUF_readable|IOMMUF_writable); > + if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, > + IOMMUF_readable|IOMMUF_writable) ) > + printk(XENLOG_G_ERR > + "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n", > + pfn * tmp + j, pfn * tmp + j); > > if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K)))) > process_pending_softirqs(); Hi, Quan, this patch looks good to me in general. Just double confirm one thing. Above doesn't handle error from iommu_map_page, while only throwing out warning. Not sure whether it has been discussed before as the agreement or not. Thanks Kevin
>>> On 17.03.16 at 08:32, <kevin.tian@intel.com> wrote: >> From: Xu, Quan >> Sent: Thursday, March 17, 2016 2:55 PM >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d) >> >> tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K); >> for ( j = 0; j < tmp; j++ ) >> - iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, >> - IOMMUF_readable|IOMMUF_writable); >> + if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, >> + IOMMUF_readable|IOMMUF_writable) ) >> + printk(XENLOG_G_ERR >> + "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n", >> + pfn * tmp + j, pfn * tmp + j); >> >> if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K)))) >> process_pending_softirqs(); > > Hi, Quan, this patch looks good to me in general. Just double confirm one > thing. Above doesn't handle error from iommu_map_page, while only > throwing out warning. Not sure whether it has been discussed before > as the agreement or not. For code paths involved in preparing the hardware domain only I had specifically asked to handle such in a best effort manner, instead of explicitly rendering a system unbootable. Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, March 17, 2016 3:59 PM > > >>> On 17.03.16 at 08:32, <kevin.tian@intel.com> wrote: > >> From: Xu, Quan > >> Sent: Thursday, March 17, 2016 2:55 PM > >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c > >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain > *d) > >> > >> tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K); > >> for ( j = 0; j < tmp; j++ ) > >> - iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, > >> - IOMMUF_readable|IOMMUF_writable); > >> + if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, > >> + IOMMUF_readable|IOMMUF_writable) ) > >> + printk(XENLOG_G_ERR > >> + "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n", > >> + pfn * tmp + j, pfn * tmp + j); > >> > >> if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K)))) > >> process_pending_softirqs(); > > > > Hi, Quan, this patch looks good to me in general. Just double confirm one > > thing. Above doesn't handle error from iommu_map_page, while only > > throwing out warning. Not sure whether it has been discussed before > > as the agreement or not. > > For code paths involved in preparing the hardware domain only > I had specifically asked to handle such in a best effort manner, > instead of explicitly rendering a system unbootable. > OK, good to know that.
On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > index c997b53..526548e 100644 > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type, > int preemptible) > { > unsigned long nx, x, y = page->u.inuse.type_info; > - int rc = 0; > + int rc = 0, ret = 0; > > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); > > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type, > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) > { > if ( (x & PGT_type_mask) == PGT_writable_page ) > - iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); > + ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); > else if ( type == PGT_writable_page ) > - iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), > - page_to_mfn(page), > - IOMMUF_readable|IOMMUF_writable); > + ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), > + page_to_mfn(page), > + IOMMUF_readable|IOMMUF_writable); > } > } > > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type, > if ( (x & PGT_partial) && !(nx & PGT_partial) ) > put_page(page); > > + if ( !rc ) > + rc = ret; > + What's this about? If the iommu_[un]map_page() operation times out, we still go through with calling alloc_page_type(); and if alloc_page_type() fails we return its failure value, but if it succeeds, we return the error from iommu_[un]map_page()? > return rc; > } > > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c > index 3cb6868..f9bcce7 100644 > --- a/xen/arch/x86/mm/p2m-ept.c > +++ b/xen/arch/x86/mm/p2m-ept.c > @@ -830,7 +830,15 @@ out: > { > if ( iommu_flags ) > for ( i = 0; i < (1 << order); i++ ) > - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > + { > + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > + if ( rc ) > + { > + while ( i-- > 0 ) > + iommu_unmap_page(d, gfn + i); This won't unmap gfn+0 (since it will break out when i == 0 without calling unmap). If i were signed, we could make the conditional ">= 0"; but unfortunately it's unsigned, so you'll have to do something more complicated. While we're at it, is it worth adding "unlikely()" to the if() condition here? > + break; > + } > + } > else > for ( i = 0; i < (1 << order); i++ ) > iommu_unmap_page(d, gfn + i); > diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c > index 3d80612..c33b753 100644 > --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -680,8 +680,16 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, > } > else if ( iommu_pte_flags ) > for ( i = 0; i < (1UL << page_order); i++ ) > - iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, > - iommu_pte_flags); > + { > + rc = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, > + iommu_pte_flags); > + if ( rc ) > + { > + while ( i-- > 0 ) > + iommu_unmap_page(p2m->domain, gfn + i); Same with gfn+0. -George
On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: >> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c >> index c997b53..526548e 100644 >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type, >> int preemptible) >> { >> unsigned long nx, x, y = page->u.inuse.type_info; >> - int rc = 0; >> + int rc = 0, ret = 0; >> >> ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); >> >> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type, >> if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) >> { >> if ( (x & PGT_type_mask) == PGT_writable_page ) >> - iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); >> + ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); >> else if ( type == PGT_writable_page ) >> - iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), >> - page_to_mfn(page), >> - IOMMUF_readable|IOMMUF_writable); >> + ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), >> + page_to_mfn(page), >> + IOMMUF_readable|IOMMUF_writable); >> } >> } >> >> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type, >> if ( (x & PGT_partial) && !(nx & PGT_partial) ) >> put_page(page); >> >> + if ( !rc ) >> + rc = ret; >> + > > What's this about? If the iommu_[un]map_page() operation times out, > we still go through with calling alloc_page_type(); and if > alloc_page_type() fails we return its failure value, but if it > succeeds, we return the error from iommu_[un]map_page()? > >> return rc; >> } >> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c >> index 3cb6868..f9bcce7 100644 >> --- a/xen/arch/x86/mm/p2m-ept.c >> +++ b/xen/arch/x86/mm/p2m-ept.c >> @@ -830,7 +830,15 @@ out: >> { >> if ( iommu_flags ) >> for ( i = 0; i < (1 << order); i++ ) >> - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); >> + { >> + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); >> + if ( rc ) >> + { >> + while ( i-- > 0 ) >> + iommu_unmap_page(d, gfn + i); > > This won't unmap gfn+0 (since it will break out when i == 0 without > calling unmap). Oh, no it won't, because the decrement is postfix. For us mere mortals, I'd appreciate a comment here like this: /* Postfix operator means we will call unmap with i == 0 */ Thanks, -George
>>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote: > @@ -53,11 +55,21 @@ static int device_power_down(void) > > ioapic_suspend(); > > - iommu_suspend(); > + err = iommu_suspend(); > + if ( err ) > + goto iommu_suspend_error; > > lapic_suspend(); > > return 0; > + > +iommu_suspend_error: Labels indented by at least one space please. > --- a/xen/arch/x86/mm/p2m-ept.c > +++ b/xen/arch/x86/mm/p2m-ept.c > @@ -830,7 +830,15 @@ out: > { > if ( iommu_flags ) > for ( i = 0; i < (1 << order); i++ ) > - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > + { > + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > + if ( rc ) > + { > + while ( i-- > 0 ) > + iommu_unmap_page(d, gfn + i); > + break; > + } > + } > else > for ( i = 0; i < (1 << order); i++ ) > iommu_unmap_page(d, gfn + i); Earlier on in the PV mm code you also checked iommu_unmap_page()'s return code - why not here (and also in p2m-pt.c)? Also I'm quite unhappy about the inconsistent state you leave things in: You unmap from the IOMMU, return an error, but leave the EPT entry in place. > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref( > { > nr_gets++; > (void)get_page(pg, rd); > - if ( !(op->flags & GNTMAP_readonly) ) > - get_page_type(pg, PGT_writable_page); > + if ( !(op->flags & GNTMAP_readonly) && > + !get_page_type(pg, PGT_writable_page) ) > + goto could_not_pin; This needs explanation, as it doesn't look related to what your actual goal is: If an error was possible here, I think this would be a security issue. However, as also kind of documented by the explicitly ignored return value from get_page(), it is my understanding there here we only obtain an _extra_ reference. > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain *d, > if ( need_iommu(d) ) > { > this_cpu(iommu_dont_flush_iotlb) = 0; > - iommu_iotlb_flush(d, xatp->idx - done, done); > - iommu_iotlb_flush(d, xatp->gpfn - done, done); > + rc = iommu_iotlb_flush(d, xatp->idx - done, done); > + if ( !rc ) > + rc = iommu_iotlb_flush(d, xatp->gpfn - done, done); > } And the pattern repeats - you now return an error, but you don't roll back the now failed operation. But wait - maybe that intended: Are you meaning to crash the guest in such cases (somewhere deep in the flush code)? If so, I think that's fine, but you absolutely would need to say so in the commit message. > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct domain *d) > this_cpu(iommu_dont_flush_iotlb) = 0; > > if ( !rc ) > - iommu_iotlb_flush_all(d); > + { > + rc = iommu_iotlb_flush_all(d); > + if ( rc ) > + iommu_teardown(d); > + } > else if ( rc != -ERESTART ) > iommu_teardown(d); Why can't you just use the existing call to iommu_teardown(), by simply deleting the "else"? Jan
On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: > > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: > >> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index > >> c997b53..526548e 100644 > >> --- a/xen/arch/x86/mm.c > >> +++ b/xen/arch/x86/mm.c > >> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > >> int preemptible) { > >> unsigned long nx, x, y = page->u.inuse.type_info; > >> - int rc = 0; > >> + int rc = 0, ret = 0; > >> > >> ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); > >> > >> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > >> if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) > >> { > >> if ( (x & PGT_type_mask) == PGT_writable_page ) > >> - iommu_unmap_page(d, mfn_to_gmfn(d, > page_to_mfn(page))); > >> + ret = iommu_unmap_page(d, mfn_to_gmfn(d, > page_to_mfn(page))); > >> else if ( type == PGT_writable_page ) > >> - iommu_map_page(d, mfn_to_gmfn(d, > page_to_mfn(page)), > >> - page_to_mfn(page), > >> - > IOMMUF_readable|IOMMUF_writable); > >> + ret = iommu_map_page(d, mfn_to_gmfn(d, > page_to_mfn(page)), > >> + page_to_mfn(page), > >> + > IOMMUF_readable|IOMMUF_writable); > >> } > >> } > >> > >> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > >> if ( (x & PGT_partial) && !(nx & PGT_partial) ) > >> put_page(page); > >> > >> + if ( !rc ) > >> + rc = ret; > >> + > > > > What's this about? If the iommu_[un]map_page() operation times out, > > we still go through with calling alloc_page_type(); and if > > alloc_page_type() fails we return its failure value, but if it > > succeeds, we return the error from iommu_[un]map_page()? > > > >> return rc; > >> } > >> > >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c > >> index 3cb6868..f9bcce7 100644 > >> --- a/xen/arch/x86/mm/p2m-ept.c > >> +++ b/xen/arch/x86/mm/p2m-ept.c > >> @@ -830,7 +830,15 @@ out: > >> { > >> if ( iommu_flags ) > >> for ( i = 0; i < (1 << order); i++ ) > >> - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, > iommu_flags); > >> + { > >> + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, > iommu_flags); > >> + if ( rc ) > >> + { > >> + while ( i-- > 0 ) > >> + iommu_unmap_page(d, gfn + i); > > > > This won't unmap gfn+0 (since it will break out when i == 0 without > > calling unmap). > > Oh, no it won't, because the decrement is postfix. > > For us mere mortals, I'd appreciate a comment here like this: > > /* Postfix operator means we will call unmap with i == 0 */ > Agreed. For these 2 points, to summarize: - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) ) - adding a comment: /* Postfix operator means we will call unmap with i == 0 */ Right? Quan
On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote: > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index > > c997b53..526548e 100644 > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, > unsigned long type, > > int preemptible) { > > unsigned long nx, x, y = page->u.inuse.type_info; > > - int rc = 0; > > + int rc = 0, ret = 0; > > > > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); > > > > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) > > { > > if ( (x & PGT_type_mask) == PGT_writable_page ) > > - iommu_unmap_page(d, mfn_to_gmfn(d, > page_to_mfn(page))); > > + ret = iommu_unmap_page(d, mfn_to_gmfn(d, > page_to_mfn(page))); > > else if ( type == PGT_writable_page ) > > - iommu_map_page(d, mfn_to_gmfn(d, > page_to_mfn(page)), > > - page_to_mfn(page), > > - > IOMMUF_readable|IOMMUF_writable); > > + ret = iommu_map_page(d, mfn_to_gmfn(d, > page_to_mfn(page)), > > + page_to_mfn(page), > > + > IOMMUF_readable|IOMMUF_writable); > > } > > } > > > > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, > unsigned long type, > > if ( (x & PGT_partial) && !(nx & PGT_partial) ) > > put_page(page); > > > > + if ( !rc ) > > + rc = ret; > > + > > What's this about? If the iommu_[un]map_page() operation times out, > we still go through with calling alloc_page_type(); and if > alloc_page_type() fails we return its failure value, but if it > succeeds, we return the error from iommu_[un]map_page()? > Yes. To be honest, to me, it is tricky too. I found it is not right to return directly if the iommu_[un]map_page() operation times out. """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )""" Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}" Quan
>>> On 18.03.16 at 04:19, <quan.xu@intel.com> wrote: > On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com> wrote: >> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap >> <George.Dunlap@eu.citrix.com> wrote: >> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: >> >> --- a/xen/arch/x86/mm/p2m-ept.c >> >> +++ b/xen/arch/x86/mm/p2m-ept.c >> >> @@ -830,7 +830,15 @@ out: >> >> { >> >> if ( iommu_flags ) >> >> for ( i = 0; i < (1 << order); i++ ) >> >> - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, >> iommu_flags); >> >> + { >> >> + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, >> iommu_flags); >> >> + if ( rc ) >> >> + { >> >> + while ( i-- > 0 ) >> >> + iommu_unmap_page(d, gfn + i); >> > >> > This won't unmap gfn+0 (since it will break out when i == 0 without >> > calling unmap). >> >> Oh, no it won't, because the decrement is postfix. >> >> For us mere mortals, I'd appreciate a comment here like this: >> >> /* Postfix operator means we will call unmap with i == 0 */ >> > Agreed. > For these 2 points, to summarize: > - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) ) > - adding a comment: > /* Postfix operator means we will call unmap with i == 0 */ To be honest, I'm opposed to the addition of such comments. See also the parallel discussion rooted at http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html Jan
>>> On 18.03.16 at 08:54, <quan.xu@intel.com> wrote: > On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote: >> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: >> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index >> > c997b53..526548e 100644 >> > --- a/xen/arch/x86/mm.c >> > +++ b/xen/arch/x86/mm.c >> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, >> unsigned long type, >> > int preemptible) { >> > unsigned long nx, x, y = page->u.inuse.type_info; >> > - int rc = 0; >> > + int rc = 0, ret = 0; >> > >> > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); >> > >> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info >> *page, unsigned long type, >> > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) >> > { >> > if ( (x & PGT_type_mask) == PGT_writable_page ) >> > - iommu_unmap_page(d, mfn_to_gmfn(d, >> page_to_mfn(page))); >> > + ret = iommu_unmap_page(d, mfn_to_gmfn(d, >> page_to_mfn(page))); >> > else if ( type == PGT_writable_page ) >> > - iommu_map_page(d, mfn_to_gmfn(d, >> page_to_mfn(page)), >> > - page_to_mfn(page), >> > - >> IOMMUF_readable|IOMMUF_writable); >> > + ret = iommu_map_page(d, mfn_to_gmfn(d, >> page_to_mfn(page)), >> > + page_to_mfn(page), >> > + >> IOMMUF_readable|IOMMUF_writable); >> > } >> > } >> > >> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, >> unsigned long type, >> > if ( (x & PGT_partial) && !(nx & PGT_partial) ) >> > put_page(page); >> > >> > + if ( !rc ) >> > + rc = ret; >> > + >> >> What's this about? If the iommu_[un]map_page() operation times out, >> we still go through with calling alloc_page_type(); and if >> alloc_page_type() fails we return its failure value, but if it >> succeeds, we return the error from iommu_[un]map_page()? >> > Yes. > To be honest, to me, it is tricky too. > I found it is not right to return directly if the iommu_[un]map_page() > operation times out. > > """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )""" > Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}" What strange a question: Of course it does. As you can infer form the reply I sent yesterday, you first need to settle on an abstract model: What do failures mean? If, in the flush case, a timeout is going to kill the domain anyway, then error handling can be lighter weight than if you mean to try to keep the domain running. Of course in this context you also should not forget that iommu_map_page() could already return errors prior to your changes (most notably -ENOMEM, but at least the AMD side also produces others, with -EFAULT generally being accompanied by domain_crash()). As mentioned elsewhere - it seems extremely bogus that these errors weren't check for from the beginning. Jan
On March 18, 2016 4:20pm, <JBeulich@suse.com> wrote: > >>> On 18.03.16 at 08:54, <quan.xu@intel.com> wrote: > > On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote: > >> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: > >> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index > >> > c997b53..526548e 100644 > >> > --- a/xen/arch/x86/mm.c > >> > +++ b/xen/arch/x86/mm.c > >> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info > >> > *page, > >> unsigned long type, > >> > int preemptible) { > >> > unsigned long nx, x, y = page->u.inuse.type_info; > >> > - int rc = 0; > >> > + int rc = 0, ret = 0; > >> > > >> > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); > >> > > >> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info > >> *page, unsigned long type, > >> > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) > >> > { > >> > if ( (x & PGT_type_mask) == PGT_writable_page ) > >> > - iommu_unmap_page(d, mfn_to_gmfn(d, > >> page_to_mfn(page))); > >> > + ret = iommu_unmap_page(d, mfn_to_gmfn(d, > >> page_to_mfn(page))); > >> > else if ( type == PGT_writable_page ) > >> > - iommu_map_page(d, mfn_to_gmfn(d, > >> page_to_mfn(page)), > >> > - page_to_mfn(page), > >> > - > >> IOMMUF_readable|IOMMUF_writable); > >> > + ret = iommu_map_page(d, mfn_to_gmfn(d, > >> page_to_mfn(page)), > >> > + page_to_mfn(page), > >> > + > >> IOMMUF_readable|IOMMUF_writable); > >> > } > >> > } > >> > > >> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info > >> > *page, > >> unsigned long type, > >> > if ( (x & PGT_partial) && !(nx & PGT_partial) ) > >> > put_page(page); > >> > > >> > + if ( !rc ) > >> > + rc = ret; > >> > + > >> > >> What's this about? If the iommu_[un]map_page() operation times out, > >> we still go through with calling alloc_page_type(); and if > >> alloc_page_type() fails we return its failure value, but if it > >> succeeds, we return the error from iommu_[un]map_page()? > >> > > Yes. > > To be honest, to me, it is tricky too. > > I found it is not right to return directly if the iommu_[un]map_page() > > operation times out. > > > > """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )""" > > Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}" > > What strange a question: Of course it does. > Jan, thanks. To be honest, this stupid question was always in my mind. > As you can infer form the reply I sent yesterday, you first need to settle on an > abstract model: What do failures mean? If, in the flush case, a timeout is going > to kill the domain anyway, then error handling can be lighter weight than if you > mean to try to keep the domain running. Of course in this context you also > should not forget that iommu_map_page() could already return errors prior to > your changes (most notably -ENOMEM, but at least the AMD side also produces > others, with -EFAULT generally being accompanied by domain_crash()). As > mentioned elsewhere - it seems extremely bogus that these errors weren't > check for from the beginning. > Jan, I am not familiar/confident enough to this single point, could you tell me more how to fix it? Quan
>>> On 18.03.16 at 10:09, <quan.xu@intel.com> wrote: > On March 18, 2016 4:20pm, <JBeulich@suse.com> wrote: >> As you can infer form the reply I sent yesterday, you first need to settle on an >> abstract model: What do failures mean? If, in the flush case, a timeout is going >> to kill the domain anyway, then error handling can be lighter weight than if you >> mean to try to keep the domain running. Of course in this context you also >> should not forget that iommu_map_page() could already return errors prior to >> your changes (most notably -ENOMEM, but at least the AMD side also produces >> others, with -EFAULT generally being accompanied by domain_crash()). As >> mentioned elsewhere - it seems extremely bogus that these errors weren't >> check for from the beginning. >> > Jan, I am not familiar/confident enough to this single point, could you tell > me more how to fix it? Not sure what exactly you're asking for: As said, we first need to settle on an abstract model. Do we want IOMMU mapping failures to be fatal to the domain (perhaps with the exception of the hardware one)? I think we do, and for the hardware domain we'd do things on a best effort basis (always erring on the side of unmapping). Which would probably mean crashing the domain could be centralized in iommu_{,un}map_page(). How much roll back would then still be needed in callers of these functions for the hardware domain's sake would need to be seen. So before you start coing, give others (namely but not limited to VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance to voice differing opinions. Jan
On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote: > > > Not sure what exactly you're asking for: As said, we first need to > settle on an abstract model. Do we want IOMMU mapping failures > to be fatal to the domain (perhaps with the exception of the > hardware one)? I think we do, and for the hardware domain we'd > do things on a best effort basis (always erring on the side of > unmapping). Which would probably mean crashing the domain > could be centralized in iommu_{,un}map_page(). How much roll > back would then still be needed in callers of these functions for > the hardware domain's sake would need to be seen. > > So before you start coing, give others (namely but not limited to > VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance > to voice differing opinions. > I'm nothing of the above but, FWIW, the behavior Jan described (crashing the domain for all domains but the hardware domain) was indeed the intended plan for this series, as far as I understood from talking to people and looking at previous email conversations and submissions. And it looks to me like it is a sane plan. Regards, Dario
>>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote: > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote: >> > >> Not sure what exactly you're asking for: As said, we first need to >> settle on an abstract model. Do we want IOMMU mapping failures >> to be fatal to the domain (perhaps with the exception of the >> hardware one)? I think we do, and for the hardware domain we'd >> do things on a best effort basis (always erring on the side of >> unmapping). Which would probably mean crashing the domain >> could be centralized in iommu_{,un}map_page(). How much roll >> back would then still be needed in callers of these functions for >> the hardware domain's sake would need to be seen. >> >> So before you start coing, give others (namely but not limited to >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance >> to voice differing opinions. >> > I'm nothing of the above but, Don't you fall under "but not limited to" ;-) ? > FWIW, the behavior Jan described > (crashing the domain for all domains but the hardware domain) was > indeed the intended plan for this series, as far as I understood from > talking to people and looking at previous email conversations and > submissions. That was taking only the flush timeout as an error source into account. Now that we see that the lack of error handling pre-exists, we can't just extend that intended model to also cover those other error reasons without at least having given people a chance to object. Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, March 18, 2016 5:49 PM > > >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote: > > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote: > >> > > >> Not sure what exactly you're asking for: As said, we first need to > >> settle on an abstract model. Do we want IOMMU mapping failures > >> to be fatal to the domain (perhaps with the exception of the > >> hardware one)? I think we do, and for the hardware domain we'd > >> do things on a best effort basis (always erring on the side of > >> unmapping). Which would probably mean crashing the domain > >> could be centralized in iommu_{,un}map_page(). How much roll > >> back would then still be needed in callers of these functions for > >> the hardware domain's sake would need to be seen. > >> > >> So before you start coing, give others (namely but not limited to > >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance > >> to voice differing opinions. > >> > > I'm nothing of the above but, > > Don't you fall under "but not limited to" ;-) ? > > > FWIW, the behavior Jan described > > (crashing the domain for all domains but the hardware domain) was > > indeed the intended plan for this series, as far as I understood from > > talking to people and looking at previous email conversations and > > submissions. > > That was taking only the flush timeout as an error source into account. > Now that we see that the lack of error handling pre-exists, we can't > just extend that intended model to also cover those other error > reasons without at least having given people a chance to object. > It makes sense so I'm OK with this behavior change. Then regarding to Quan's next version (if nobody disagrees), is it better to first describe related callers and then reach agreement which caller still needs error handling for hardware domain, before Quan starts to change actual code (at least based on current discussion Quan didn't have thorough understanding of such necessity yet)? Thanks Kevin
>>> On 21.03.16 at 07:18, <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Friday, March 18, 2016 5:49 PM >> >> >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote: >> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote: >> >> > >> >> Not sure what exactly you're asking for: As said, we first need to >> >> settle on an abstract model. Do we want IOMMU mapping failures >> >> to be fatal to the domain (perhaps with the exception of the >> >> hardware one)? I think we do, and for the hardware domain we'd >> >> do things on a best effort basis (always erring on the side of >> >> unmapping). Which would probably mean crashing the domain >> >> could be centralized in iommu_{,un}map_page(). How much roll >> >> back would then still be needed in callers of these functions for >> >> the hardware domain's sake would need to be seen. >> >> >> >> So before you start coing, give others (namely but not limited to >> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance >> >> to voice differing opinions. >> >> >> > I'm nothing of the above but, >> >> Don't you fall under "but not limited to" ;-) ? >> >> > FWIW, the behavior Jan described >> > (crashing the domain for all domains but the hardware domain) was >> > indeed the intended plan for this series, as far as I understood from >> > talking to people and looking at previous email conversations and >> > submissions. >> >> That was taking only the flush timeout as an error source into account. >> Now that we see that the lack of error handling pre-exists, we can't >> just extend that intended model to also cover those other error >> reasons without at least having given people a chance to object. >> > > It makes sense so I'm OK with this behavior change. > > Then regarding to Quan's next version (if nobody disagrees), is it better > to first describe related callers and then reach agreement which caller > still needs error handling for hardware domain, before Quan starts to > change actual code (at least based on current discussion Quan didn't > have thorough understanding of such necessity yet)? Well, ideally we'd indeed reach agreement before starting to write or change code. However, in a case like this where error handling was just ignored in pre-existing code, I think the easiest way to get a complete picture is to see what places / paths need changing. I.e. here it may indeed be better to start from the patches. Whether that means posting the patches in patch form, or - prior to posting them - extracting what was learned into a textual description is a different aspect (and I'd leave it to Quan to pick the route more suitable to him). Jan
On March 18, 2016 4:10pm, <JBeulich@suse.com> wrote: > >>> On 18.03.16 at 04:19, <quan.xu@intel.com> wrote: > > On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com> > wrote: > >> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap > >> <George.Dunlap@eu.citrix.com> wrote: > >> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote: > >> >> --- a/xen/arch/x86/mm/p2m-ept.c > >> >> +++ b/xen/arch/x86/mm/p2m-ept.c > >> >> @@ -830,7 +830,15 @@ out: > >> >> { > >> >> if ( iommu_flags ) > >> >> for ( i = 0; i < (1 << order); i++ ) > >> >> - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, > >> iommu_flags); > >> >> + { > >> >> + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + > >> >> + i, > >> iommu_flags); > >> >> + if ( rc ) > >> >> + { > >> >> + while ( i-- > 0 ) > >> >> + iommu_unmap_page(d, gfn + i); > >> > > >> > This won't unmap gfn+0 (since it will break out when i == 0 without > >> > calling unmap). > >> > >> Oh, no it won't, because the decrement is postfix. > >> > >> For us mere mortals, I'd appreciate a comment here like this: > >> > >> /* Postfix operator means we will call unmap with i == 0 */ > >> > > Agreed. > > For these 2 points, to summarize: > > - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) ) > > - adding a comment: > > /* Postfix operator means we will call unmap with i == 0 */ > > To be honest, I'm opposed to the addition of such comments. > See also the parallel discussion rooted at > http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html > Reading the Follow-Ups email, it looks a pretty common cleanup pattern. now I don't fully get this point, but I would try to follow this pattern. Quan
On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote: > >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote: > > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote: > >> > > >> Not sure what exactly you're asking for: As said, we first need to > >> settle on an abstract model. Do we want IOMMU mapping failures to be > >> fatal to the domain (perhaps with the exception of the hardware one)? > >> I think we do, and for the hardware domain we'd do things on a best > >> effort basis (always erring on the side of unmapping). Which would > >> probably mean crashing the domain could be centralized in > >> iommu_{,un}map_page(). How much roll back would then still be needed > >> in callers of these functions for the hardware domain's sake would > >> need to be seen. > >> > >> So before you start coing, give others (namely but not limited to > >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance to voice > >> differing opinions. > >> > > FWIW, the behavior Jan described > > (crashing the domain for all domains but the hardware domain) was > > indeed the intended plan for this series, as far as I understood from > > talking to people and looking at previous email conversations and > > submissions. > > That was taking only the flush timeout as an error source into account. > Now that we see that the lack of error handling pre-exists, we can't just extend > that intended model to also cover those other error reasons without at least > having given people a chance to object. > For this abstract model, I assume we are on the same page for the precondition: If Device-TLB flush timed out, we would hide the target ATS device and crash the domain owning this ATS device. If impacted domain is hardware domain, just throw out a warning. Then IMO, 1. Try the best to return error code. 2. Log error and don't return error value for hardware_domain init or crashed system shutdown. 3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as the error is not only from iommu flush, .e.g, '-ENOMEM'. So, we need to {,un}map from the IOMMU, return an error, and roll back the failed operation( .e.g, unmap EPT). 4. for the rest, we may return an error, but don't roll back the failed operation, and we need to analysis the different condition. Quan
>>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote: > On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote: >> That was taking only the flush timeout as an error source into account. >> Now that we see that the lack of error handling pre-exists, we can't just extend >> that intended model to also cover those other error reasons without at least >> having given people a chance to object. >> > > For this abstract model, > I assume we are on the same page for the precondition: > If Device-TLB flush timed out, we would hide the target ATS device and crash > the domain owning this ATS device. > If impacted domain is hardware domain, just throw out a warning. > > Then IMO, > 1. Try the best to return error code. > 2. Log error and don't return error value for hardware_domain init or > crashed system shutdown. > 3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as > the error is not only from iommu flush, .e.g, '-ENOMEM'. > So, we need to {,un}map from the IOMMU, return an error, and roll back > the failed operation( .e.g, unmap EPT). Well, if that possible in a provably correct way, then sure. But be clear - when the failure occurs while unmapping, unmapping the EPT entry obviously can't be the solution, you'd need a true roll back. And of course you should keep in mind what happens to the guest if such an operation fails: If you can be certain it'll crash because of this later on anyway, you're likely better off crashing it right away (such that the reason for the crash is at least obvious). Jan > 4. for the rest, we may return an error, but don't roll back the failed > operation, and we need to analysis the different condition. > > Quan
On March 24, 2016 5:59pm, <JBeulich@suse.com> wrote: > >>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote: > > On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote: > > 3. For iommu_{,un}map_page(), we'd better fix it as a normal error, > > as the error is not only from iommu flush, .e.g, '-ENOMEM'. > > So, we need to {,un}map from the IOMMU, return an error, and roll > > back the failed operation( .e.g, unmap EPT). > > Well, if that possible in a provably correct way, then sure. But be clear - when > the failure occurs while unmapping, unmapping the EPT entry obviously can't be > the solution, I hope we discuss about the same point as bellow?: ept_set_entry() { .... if ( iommu_flags ) for ( i = 0; i < (1 << order); i++ ) iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); else for ( i = 0; i < (1 << order); i++ ) iommu_unmap_page(d, gfn + i); .... } > you'd need a true roll back. Does it refer to as: If the old entry is present, we need to write back the old entry? > And of course you should keep in mind > what happens to the guest if such an operation fails: If you can be certain it'll > crash because of this later on anyway, you're likely better off crashing it right > away (such that the reason for the crash is at least obvious). > I think, for iommu_{,un}map_page(), it would be not aware that the domain is going to crash. As mentioned, the error is not only from iommu flush. We need to fix it one by one. fortunately, it is limited.
>>> On 24.03.16 at 15:12, <quan.xu@intel.com> wrote: > On March 24, 2016 5:59pm, <JBeulich@suse.com> wrote: >> >>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote: >> > On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote: > > >> > 3. For iommu_{,un}map_page(), we'd better fix it as a normal error, >> > as the error is not only from iommu flush, .e.g, '-ENOMEM'. >> > So, we need to {,un}map from the IOMMU, return an error, and roll >> > back the failed operation( .e.g, unmap EPT). >> >> Well, if that possible in a provably correct way, then sure. But be clear - when >> the failure occurs while unmapping, unmapping the EPT entry obviously can't be >> the solution, > > I hope we discuss about the same point as bellow?: > ept_set_entry() > { > .... > if ( iommu_flags ) > for ( i = 0; i < (1 << order); i++ ) > iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > else > for ( i = 0; i < (1 << order); i++ ) > iommu_unmap_page(d, gfn + i); > .... > } > > >> you'd need a true roll back. > > Does it refer to as: > If the old entry is present, we need to write back the old entry? Yes, but that's not as simple as it sounds: You also need to take care of the splitting of super pages which may have occurred. Jan
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index 2885e31..50edf3f 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -45,6 +45,8 @@ void do_suspend_lowlevel(void); static int device_power_down(void) { + int err; + console_suspend(); time_suspend(); @@ -53,11 +55,21 @@ static int device_power_down(void) ioapic_suspend(); - iommu_suspend(); + err = iommu_suspend(); + if ( err ) + goto iommu_suspend_error; lapic_suspend(); return 0; + +iommu_suspend_error: + ioapic_resume(); + i8259A_resume(); + time_resume(); + console_resume(); + + return err; } static void device_power_up(void) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index c997b53..526548e 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type, int preemptible) { unsigned long nx, x, y = page->u.inuse.type_info; - int rc = 0; + int rc = 0, ret = 0; ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type, if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) { if ( (x & PGT_type_mask) == PGT_writable_page ) - iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); + ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); else if ( type == PGT_writable_page ) - iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), - page_to_mfn(page), - IOMMUF_readable|IOMMUF_writable); + ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), + page_to_mfn(page), + IOMMUF_readable|IOMMUF_writable); } } @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type, if ( (x & PGT_partial) && !(nx & PGT_partial) ) put_page(page); + if ( !rc ) + rc = ret; + return rc; } diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 3cb6868..f9bcce7 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -830,7 +830,15 @@ out: { if ( iommu_flags ) for ( i = 0; i < (1 << order); i++ ) - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); + { + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); + if ( rc ) + { + while ( i-- > 0 ) + iommu_unmap_page(d, gfn + i); + break; + } + } else for ( i = 0; i < (1 << order); i++ ) iommu_unmap_page(d, gfn + i); diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index 3d80612..c33b753 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -680,8 +680,16 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, } else if ( iommu_pte_flags ) for ( i = 0; i < (1UL << page_order); i++ ) - iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, - iommu_pte_flags); + { + rc = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, + iommu_pte_flags); + if ( rc ) + { + while ( i-- > 0 ) + iommu_unmap_page(p2m->domain, gfn + i); + break; + } + } else for ( i = 0; i < (1UL << page_order); i++ ) iommu_unmap_page(p2m->domain, gfn + i); diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 8b22299..b410ffc 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -932,8 +932,9 @@ __gnttab_map_grant_ref( { nr_gets++; (void)get_page(pg, rd); - if ( !(op->flags & GNTMAP_readonly) ) - get_page_type(pg, PGT_writable_page); + if ( !(op->flags & GNTMAP_readonly) && + !get_page_type(pg, PGT_writable_page) ) + goto could_not_pin; } } } diff --git a/xen/common/memory.c b/xen/common/memory.c index ef57219..543647d 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain *d, if ( need_iommu(d) ) { this_cpu(iommu_dont_flush_iotlb) = 0; - iommu_iotlb_flush(d, xatp->idx - done, done); - iommu_iotlb_flush(d, xatp->gpfn - done, done); + rc = iommu_iotlb_flush(d, xatp->idx - done, done); + if ( !rc ) + rc = iommu_iotlb_flush(d, xatp->gpfn - done, done); } #endif diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index b64676f..29efbfe 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -277,24 +277,28 @@ static void iommu_free_pagetables(unsigned long unused) cpumask_cycle(smp_processor_id(), &cpu_online_map)); } -void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count) +int iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count) { struct hvm_iommu *hd = domain_hvm_iommu(d); if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush ) - return; + return 0; hd->platform_ops->iotlb_flush(d, gfn, page_count); + + return 0; } -void iommu_iotlb_flush_all(struct domain *d) +int iommu_iotlb_flush_all(struct domain *d) { struct hvm_iommu *hd = domain_hvm_iommu(d); if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush_all ) - return; + return 0; hd->platform_ops->iotlb_flush_all(d); + + return 0; } int __init iommu_setup(void) @@ -368,11 +372,13 @@ int iommu_do_domctl( return ret; } -void iommu_suspend() +int iommu_suspend() { const struct iommu_ops *ops = iommu_get_ops(); if ( iommu_enabled ) ops->suspend(); + + return 0; } void iommu_share_p2m_table(struct domain* d) diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c index c0d6aab..e5ab10a 100644 --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d) tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K); for ( j = 0; j < tmp; j++ ) - iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, - IOMMUF_readable|IOMMUF_writable); + if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j, + IOMMUF_readable|IOMMUF_writable) ) + printk(XENLOG_G_ERR + "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n", + pfn * tmp + j, pfn * tmp + j); if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K)))) process_pending_softirqs(); diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 8cbb655..d8e3c8f 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct domain *d) this_cpu(iommu_dont_flush_iotlb) = 0; if ( !rc ) - iommu_iotlb_flush_all(d); + { + rc = iommu_iotlb_flush_all(d); + if ( rc ) + iommu_teardown(d); + } else if ( rc != -ERESTART ) iommu_teardown(d); diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 8217cb7..d6d489a 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -167,7 +167,7 @@ struct iommu_ops { void (*dump_p2m_table)(struct domain *d); }; -void iommu_suspend(void); +int iommu_suspend(void); void iommu_resume(void); void iommu_crash_shutdown(void); int iommu_get_reserved_device_memory(iommu_grdm_t *, void *); @@ -182,8 +182,8 @@ int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d, int iommu_do_domctl(struct xen_domctl *, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t)); -void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count); -void iommu_iotlb_flush_all(struct domain *d); +int iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count); +int iommu_iotlb_flush_all(struct domain *d); /* * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
Current code would be panic(), when VT-d Device-TLB flush timed out. the panic() is going to be eliminated, so we must check all kinds of error and all the way up the call trees. Signed-off-by: Quan Xu <quan.xu@intel.com> CC: Jan Beulich <jbeulich@suse.com> CC: Liu Jinsong <jinsong.liu@alibaba-inc.com> CC: Keir Fraser <keir@xen.org> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Kevin Tian <kevin.tian@intel.com> CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Feng Wu <feng.wu@intel.com> CC: Dario Faggioli <dario.faggioli@citrix.com> --- xen/arch/x86/acpi/power.c | 14 +++++++++++++- xen/arch/x86/mm.c | 13 ++++++++----- xen/arch/x86/mm/p2m-ept.c | 10 +++++++++- xen/arch/x86/mm/p2m-pt.c | 12 ++++++++++-- xen/common/grant_table.c | 5 +++-- xen/common/memory.c | 5 +++-- xen/drivers/passthrough/iommu.c | 16 +++++++++++----- xen/drivers/passthrough/vtd/x86/vtd.c | 7 +++++-- xen/drivers/passthrough/x86/iommu.c | 6 +++++- xen/include/xen/iommu.h | 6 +++--- 10 files changed, 70 insertions(+), 24 deletions(-)