From patchwork Thu May 5 06:53:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Quan Xu X-Patchwork-Id: 9021231 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D25739F1D3 for ; Thu, 5 May 2016 06:59:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9B6D6202D1 for ; Thu, 5 May 2016 06:59:26 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D9F7202B8 for ; Thu, 5 May 2016 06:59:25 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ayDDK-0005qD-04; Thu, 05 May 2016 06:56:46 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ayDDJ-0005q7-5X for xen-devel@lists.xen.org; Thu, 05 May 2016 06:56:45 +0000 Received: from [85.158.139.211] by server-6.bemta-5.messagelabs.com id C5/11-09573-CAEEA275; Thu, 05 May 2016 06:56:44 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFKsWRWlGSWpSXmKPExsXS1tbhqLv6nVa 4wfFj8hZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa8b11X9ZCrYUV2w7PIm9gfFLYRcjJ4eQQIXE xt8HmUBsCQFeiSPLZrBC2P4SOx89ALK5gGoaGCUWL1vPAuHsZpRYsmceM4Szk1Fi0adDUJk1j BK/NjxkB+lnE1CRmNH8DswWESiX2HnlF1gHs8A1RoneLyAOJ4ewQLbExjMTgYo4gIpyJPZfkI Sot5J4cbUf7CYWoDmzj70Em8MrECxx7cMpNohl+xgltn/cDVbEKRAo8aFvPtjhjAJiEt9PrQG LMwuIS9x6Mh/qOQGgs88zQ9iiEi8f/4N6VFHi7/pWRpAbmAU0Jdbv0odoVZSY0v0Qaq+gxMmZ T1hASoQE5CQ2/uCawCg1C8mCWQjNs5A0z0LSvICRZRWjenFqUVlqka6lXlJRZnpGSW5iZo6uo YGpXm5qcXFiempOYlKxXnJ+7iZGYJwyAMEOxrWtzocYJTmYlER5GTdqhQvxJeWnVGYkFmfEF5 XmpBYfYpTh4FCS4F30FignWJSanlqRlpkDTBgwaQkOHiUR3kMgad7igsTc4sx0iNQpRkUpcd5 lIAkBkERGaR5cGyxJXWKUlRLmZQQ6RIinILUoN7MEVf4VozgHo5Iw7zaQKTyZeSVw018BLWYC Wvx+ribI4pJEhJRUA2PMxnk3/x842L6fV9/yh2rrbMZddesWPLtrE5tXzpc1X3+h+qll+/+0h vJfMvvoMEnU7fZHS88XDqVZT49sbvm1qLwsOPFQXRLHBsOazfoSzC0NT36Iuvrn/Fr34JGXjf 79ev5/70KvMWZNXL/pactli6rwUzsM2I5z2u64vG1K3s3ahvN5ZbVKLMUZiYZazEXFiQD+flH TTQMAAA== X-Env-Sender: quan.xu@intel.com X-Msg-Ref: server-12.tower-206.messagelabs.com!1462431401!1621850!1 X-Originating-IP: [134.134.136.65] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 8.34; banners=-,-,- X-VirusChecked: Checked Received: (qmail 36802 invoked from network); 5 May 2016 06:56:42 -0000 Received: from mga03.intel.com (HELO mga03.intel.com) (134.134.136.65) by server-12.tower-206.messagelabs.com with SMTP; 5 May 2016 06:56:42 -0000 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP; 04 May 2016 23:56:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,581,1455004800"; d="scan'208";a="969151495" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by orsmga002.jf.intel.com with ESMTP; 04 May 2016 23:56:40 -0700 Received: from fmsmsx124.amr.corp.intel.com (10.18.125.39) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 4 May 2016 23:56:40 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx124.amr.corp.intel.com (10.18.125.39) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 4 May 2016 23:56:39 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.148]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.45]) with mapi id 14.03.0248.002; Thu, 5 May 2016 14:53:42 +0800 From: "Xu, Quan" To: George Dunlap , "Tian, Kevin" , Jan Beulich Thread-Topic: [Xen-devel] [PATCH v3 03/10] IOMMU/MMU: enhance the call trees of IOMMU unmapping and mapping Thread-Index: AQHRofmPXIQL1kniY02jU81YvbYXKJ+oTUiAgAGif1A= Date: Thu, 5 May 2016 06:53:41 +0000 Message-ID: <945CA011AD5F084CBEA3E851C0AB28894B8A9316@SHSMSX101.ccr.corp.intel.com> References: <1461921917-48394-1-git-send-email-quan.xu@intel.com> <1461921917-48394-4-git-send-email-quan.xu@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Cc: Andrew Cooper , Dario Faggioli , Keir Fraser , "Nakajima, Jun" , "xen-devel@lists.xen.org" Subject: Re: [Xen-devel] [PATCH v3 03/10] IOMMU/MMU: enhance the call trees of IOMMU unmapping and mapping X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On May 04, 2016 9:49 PM, George Dunlap wrote: > On Fri, Apr 29, 2016 at 10:25 AM, Quan Xu wrote: > > When IOMMU mapping is failed, we issue a best effort rollback, > > stopping IOMMU mapping, unmapping the previous IOMMU maps and > then > > reporting the error up to the call trees. When rollback is not > > feasible (in early initialization phase or trade-off of complexity) > > for the hardware domain, we do things on a best effort basis, only throwing > out an error message. > > > > IOMMU unmapping should perhaps continue despite an error, in an > > attempt to do best effort cleanup. > > > > Signed-off-by: Quan Xu > > > > CC: Keir Fraser > > CC: Jan Beulich > > CC: Andrew Cooper > > CC: Jun Nakajima > > CC: Kevin Tian > > CC: George Dunlap > > --- > > xen/arch/x86/mm.c | 13 ++++++++----- > > xen/arch/x86/mm/p2m-ept.c | 27 +++++++++++++++++++++++++-- > > xen/arch/x86/mm/p2m-pt.c | 24 ++++++++++++++++++++---- > > xen/arch/x86/mm/p2m.c | 11 +++++++++-- > > xen/drivers/passthrough/iommu.c | 14 +++++++++++++- > > 5 files changed, 75 insertions(+), 14 deletions(-) > > > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index > > a42097f..427097d 100644 > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > > int preemptible) { > > unsigned long nx, x, y = page->u.inuse.type_info; > > - int rc = 0; > > + int rc = 0, ret = 0; > > > > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); > > > > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) > > { > > if ( (x & PGT_type_mask) == PGT_writable_page ) > > - iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); > > + ret = iommu_unmap_page(d, mfn_to_gmfn(d, > page_to_mfn(page))); > > else if ( type == PGT_writable_page ) > > - iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), > > - page_to_mfn(page), > > - IOMMUF_readable|IOMMUF_writable); > > + ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), > > + page_to_mfn(page), > > + IOMMUF_readable|IOMMUF_writable); > > } > > } > > > > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info > *page, unsigned long type, > > if ( (x & PGT_partial) && !(nx & PGT_partial) ) > > put_page(page); > > > > + if ( !rc ) > > + rc = ret; > > + > > return rc; > > } > > > > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c > > index 1ed5b47..df87944 100644 > > --- a/xen/arch/x86/mm/p2m-ept.c > > +++ b/xen/arch/x86/mm/p2m-ept.c > > @@ -821,6 +821,8 @@ out: > > if ( needs_sync ) > > ept_sync_domain(p2m); > > > > + ret = 0; > > + > > /* For host p2m, may need to change VT-d page table.*/ > > if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) && > > need_modify_vtd_table ) > > @@ -831,11 +833,29 @@ out: > > { > > if ( iommu_flags ) > > for ( i = 0; i < (1 << order); i++ ) > > - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > > + { > > + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); > > + > > + if ( !ret && unlikely(rc) ) > > + { > > + while ( i-- ) > > + iommu_unmap_page(d, gfn + i); > > + > > + ret = rc; > > + break; > > + } > > + } > > else > > for ( i = 0; i < (1 << order); i++ ) > > - iommu_unmap_page(d, gfn + i); > > + { > > + rc = iommu_unmap_page(d, gfn + i); > > + > > + if ( !ret && unlikely(rc) ) > > + ret = rc; > > + } > > } > > + > > + rc = 0; > > } > > So the reason for doing this thing with setting ret=0, then using rc, > then setting rc to zero, is to make sure that the change is propagated > to the altp2m if necessary? > > I'm not a fan of this sort of "implied signalling". The check at the > altp2m conditional is meant to say, "everything went fine, go ahead > and propagate the change". But with your changes, that's not what we > want anymore -- we want, "If the change actually made it into the > hostp2m, propagate it to the altp2m." > > As such, I think it would be a lot clearer to just make a new boolean > variable, maybe "entry_written", which we initialize to false and then > set to true when the entry is actually written; and then change the > condition re the altp2m to "if ( entry_written && p2m_is_hostp2m(..) > )". > > Then I think you could make the ret / rc use mirror what you do > elsewhere, where ret is used to store the return value of the function > call, and it's propagated to rc only if rc is not already set. > George, Thanks for your detailed explanation. This seems an another matter of preference. Of course, I'd follow your suggestion in p2m field, while I need to hear the other maintainers' options as well (especially when it comes from Kevin/Jan, as they do spend a lot of time for me). A little bit of difference here, IMO, an int 'entry_written' is much better, as we also need to propagate the 'entry_written' up to callers, i.e. the errors returned from atomic_write_ept_entry() are '-EINVAL', '-ESRCH' and '-EBUSY'. Then, the follow is my modification (feel free to correct me): > > @@ -680,11 +680,27 @@ p2m_pt_set_entry(struct p2m_domain *p2m, > unsigned long gfn, mfn_t mfn, > > } > > else if ( iommu_pte_flags ) > > for ( i = 0; i < (1UL << page_order); i++ ) > > - iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, > > - iommu_pte_flags); > > + { > > + ret = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i, > > + iommu_pte_flags); > > + > > + if ( !rc && unlikely(ret) ) > > + { > > + while ( i-- ) > > + iommu_unmap_page(p2m->domain, gfn + i); > > + > > + rc = ret; > > + break; > > + } > > Hmm, is this conditional here right? What the code actually says to > do is to always map pages, but to only unmap them on an error if there > have been no other errors in the function so far. > > As it happens, at the moment rc cannot be non-zero here since any time > it's non-zero the code jumps down to the 'out' label, skipping this. > But if that ever changed, this would fail to unmap when it probably > should. > > It seems like this be: > > if ( unlikely(ret) ) > { > while ( i-- ) > iommu_unmap_page(p2m->domain, gfn + i); > if ( !rc ) > rc = ret; > break; > } > Looks good. Thanks. > Or if you want to assume that rc will never be zero, then put an > ASSERT() just before the for loop here. > > Everything else looks good to me. Thanks again for your work on this. > Quan --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -666,7 +666,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, ept_entry_t *table, *ept_entry = NULL; unsigned long gfn_remainder = gfn; unsigned int i, target = order / EPT_TABLE_ORDER; - int ret, rc = 0; + int ret, rc = 0, entry_written = 0; bool_t direct_mmio = (p2mt == p2m_mmio_direct); uint8_t ipat = 0; bool_t need_modify_vtd_table = 1; @@ -763,8 +763,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, /* now install the newly split ept sub-tree */ /* NB: please make sure domian is paused and no in-fly VT-d DMA. */ - rc = atomic_write_ept_entry(ept_entry, split_ept_entry, i); - ASSERT(rc == 0); + entry_written = atomic_write_ept_entry(ept_entry, split_ept_entry, i); + ASSERT(entry_written == 0); /* then move to the level we want to make real changes */ for ( ; i > target; i-- ) @@ -809,9 +809,12 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, new_entry.suppress_ve = is_epte_valid(&old_entry) ? old_entry.suppress_ve : 1; - rc = atomic_write_ept_entry(ept_entry, new_entry, target); - if ( unlikely(rc) ) + entry_written = atomic_write_ept_entry(ept_entry, new_entry, target); + if ( unlikely(entry_written) ) + { old_entry.epte = 0; + rc = entry_written; + } else if ( p2mt != p2m_invalid && (gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) ) /* Track the highest gfn for which we have ever had a valid mapping */ @@ -822,7 +825,7 @@ out: ept_sync_domain(p2m); /* For host p2m, may need to change VT-d page table.*/ - if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) && + if ( entry_written == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) && need_modify_vtd_table ) { if ( iommu_hap_pt_share ) @@ -831,10 +834,28 @@ out: { if ( iommu_flags ) for ( i = 0; i < (1 << order); i++ ) - iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); + { + ret = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); + + if ( unlikely(ret) ) + { + while ( i-- ) + iommu_unmap_page(p2m->domain, gfn + i); + + if ( !rc ) + rc = ret; + + break; + } + } else for ( i = 0; i < (1 << order); i++ ) - iommu_unmap_page(d, gfn + i); + { + ret = iommu_unmap_page(d, gfn + i); + + if ( !rc && unlikely(ret) ) + rc = ret; + } } }