mbox series

[v2,0/2] AMD/IOMMU: re-work mode updating

Message ID 8e8866de-33a8-68c0-3352-d6dfeec4a9b6@suse.com (mailing list archive)
Headers show
Series AMD/IOMMU: re-work mode updating | expand

Message

Jan Beulich Nov. 14, 2019, 4:41 p.m. UTC
update_paging_mode() in the AMD IOMMU code expects to be invoked with
the PCI devices lock held. The check occurring only when the mode
actually needs updating, the violation of this rule by the majority
of callers did go unnoticed until per-domain IOMMU setup was changed
to do away with on-demand creation of IOMMU page tables.

Unfortunately the only half way reasonable fix to this that I could
come up with requires more re-work than would seem desirable at this
time of the release process, but addressing the issue seems
unavoidable to me as its manifestation is a regression from the
IOMMU page table setup re-work. The change also isn't without risk
of further regressions - if in patch 2 I've missed a code path that
would also need to invoke the new hook, then this might mean non-
working guests (with passed-through devices on AMD hardware).

1: introduce GFN notification for translated domains
2: AMD/IOMMU: use notify_dfn() hook to update paging mode

Jan

Comments

Paul Durrant Nov. 14, 2019, 5:29 p.m. UTC | #1
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan
> Beulich
> Sent: 14 November 2019 16:42
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross <jgross@suse.com>; Sander Eikelenboom
> <linux@eikelenboom.it>; Andrew Cooper <andrew.cooper3@citrix.com>
> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> 
> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> the PCI devices lock held. The check occurring only when the mode
> actually needs updating, the violation of this rule by the majority
> of callers did go unnoticed until per-domain IOMMU setup was changed
> to do away with on-demand creation of IOMMU page tables.

Wouldn't it be safer to just get rid of update_paging_mode() and start with a reasonable number of levels?

  Paul

> 
> Unfortunately the only half way reasonable fix to this that I could
> come up with requires more re-work than would seem desirable at this
> time of the release process, but addressing the issue seems
> unavoidable to me as its manifestation is a regression from the
> IOMMU page table setup re-work. The change also isn't without risk
> of further regressions - if in patch 2 I've missed a code path that
> would also need to invoke the new hook, then this might mean non-
> working guests (with passed-through devices on AMD hardware).
> 
> 1: introduce GFN notification for translated domains
> 2: AMD/IOMMU: use notify_dfn() hook to update paging mode
> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Jan Beulich Nov. 15, 2019, 9:28 a.m. UTC | #2
On 14.11.2019 18:29,  Durrant, Paul  wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan
>> Beulich
>> Sent: 14 November 2019 16:42
>> To: xen-devel@lists.xenproject.org
>> Cc: Juergen Gross <jgross@suse.com>; Sander Eikelenboom
>> <linux@eikelenboom.it>; Andrew Cooper <andrew.cooper3@citrix.com>
>> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
>>
>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>> the PCI devices lock held. The check occurring only when the mode
>> actually needs updating, the violation of this rule by the majority
>> of callers did go unnoticed until per-domain IOMMU setup was changed
>> to do away with on-demand creation of IOMMU page tables.
> 
> Wouldn't it be safer to just get rid of update_paging_mode() and start
> with a reasonable number of levels?

Andrew did basically ask the same, but I continue to be unconvinced:
We can't pick a "reasonable" level, we have to pick the maximum a
guest may end up using. Yet why would we want to have all guests pay
the price of at least one unnecessary page walk level? I don't mean
to say I'm entirely opposed, but trading code simplicity for
performance is almost never an easy or obvious decision.

Jan
Paul Durrant Nov. 15, 2019, 11:33 a.m. UTC | #3
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 15 November 2019 09:29
> To: Durrant, Paul <pdurrant@amazon.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <andrew.cooper3@citrix.com>; Sander Eikelenboom <linux@eikelenboom.it>;
> Juergen Gross <jgross@suse.com>
> Subject: Re: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> 
> On 14.11.2019 18:29,  Durrant, Paul  wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Jan
> >> Beulich
> >> Sent: 14 November 2019 16:42
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Juergen Gross <jgross@suse.com>; Sander Eikelenboom
> >> <linux@eikelenboom.it>; Andrew Cooper <andrew.cooper3@citrix.com>
> >> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> >>
> >> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> >> the PCI devices lock held. The check occurring only when the mode
> >> actually needs updating, the violation of this rule by the majority
> >> of callers did go unnoticed until per-domain IOMMU setup was changed
> >> to do away with on-demand creation of IOMMU page tables.
> >
> > Wouldn't it be safer to just get rid of update_paging_mode() and start
> > with a reasonable number of levels?
> 
> Andrew did basically ask the same, but I continue to be unconvinced:
> We can't pick a "reasonable" level, we have to pick the maximum a
> guest may end up using. Yet why would we want to have all guests pay
> the price of at least one unnecessary page walk level? I don't mean
> to say I'm entirely opposed, but trading code simplicity for
> performance is almost never an easy or obvious decision.

I think in this case, versus the hoops your patches have to jump through just to save (possibly) a level of IOMMU page walk, the simplicity argument is quite compelling... particularly at this stage in the release cycle.
The fact that we don't know, at start of day, what the max gfn of the guest is going to be is also something that really ought to be fixed too... but that is another debate.

  Paul

> 
> Jan
Andrew Cooper Nov. 15, 2019, 11:53 a.m. UTC | #4
On 15/11/2019 11:33, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 15 November 2019 09:29
>> To: Durrant, Paul <pdurrant@amazon.com>
>> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Sander Eikelenboom <linux@eikelenboom.it>;
>> Juergen Gross <jgross@suse.com>
>> Subject: Re: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
>>
>> On 14.11.2019 18:29,  Durrant, Paul  wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
>> Jan
>>>> Beulich
>>>> Sent: 14 November 2019 16:42
>>>> To: xen-devel@lists.xenproject.org
>>>> Cc: Juergen Gross <jgross@suse.com>; Sander Eikelenboom
>>>> <linux@eikelenboom.it>; Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
>>>>
>>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>>>> the PCI devices lock held. The check occurring only when the mode
>>>> actually needs updating, the violation of this rule by the majority
>>>> of callers did go unnoticed until per-domain IOMMU setup was changed
>>>> to do away with on-demand creation of IOMMU page tables.
>>> Wouldn't it be safer to just get rid of update_paging_mode() and start
>>> with a reasonable number of levels?
>> Andrew did basically ask the same, but I continue to be unconvinced:
>> We can't pick a "reasonable" level, we have to pick the maximum a
>> guest may end up using.

4, and it is a reasonable number.

>> Yet why would we want to have all guests pay
>> the price of at least one unnecessary page walk level?

To a first approximation, I don't care.  The PTE will be cached on first
read, and in the common case will have no need to be invalidated.  I
doubt there would be any visible effect.

>>  I don't mean
>> to say I'm entirely opposed, but trading code simplicity for
>> performance is almost never an easy or obvious decision.
> I think in this case, versus the hoops your patches have to jump through just to save (possibly) a level of IOMMU page walk, the simplicity argument is quite compelling... particularly at this stage in the release cycle.

I agree.  The walk length should not ever have been variable.

The cost of the added complexity in Xen far outweighs any perceived
benefit.  Furthermore, you're adding an invasive and confusing core api
change to common code to work around a bug in a piece of functionality
which shouldn't have ever existed.

The safe option for 4.13 is a one-liner defaulting to 4 levels.

> The fact that we don't know, at start of day, what the max gfn of the guest is going to be is also something that really ought to be fixed too... but that is another debate.

We need it for migration safety decisions, so is on my list of things
needing to include into domaincreate.

At a future point where this information is available, we could optimise
4 down to 3 in most common cases.

~Andrew