mbox series

[v4,00/21] IOMMU: superpage support when not sharing pagetables

Message ID b92e294e-7277-d977-bb96-7c28d60000c6@suse.com (mailing list archive)
Headers show
Series IOMMU: superpage support when not sharing pagetables | expand

Message

Jan Beulich April 25, 2022, 8:29 a.m. UTC
For a long time we've been rather inefficient with IOMMU page table
management when not sharing page tables, i.e. in particular for PV (and
further specifically also for PV Dom0) and AMD (where nowadays we never
share page tables). While up to about 2.5 years ago AMD code had logic
to un-shatter page mappings, that logic was ripped out for being buggy
(XSA-275 plus follow-on).

This series enables use of large pages in AMD and Intel (VT-d) code;
Arm is presently not in need of any enabling as pagetables are always
shared there. It also augments PV Dom0 creation with suitable explicit
IOMMU mapping calls to facilitate use of large pages there. Depending
on the amount of memory handed to Dom0 this improves booting time
(latency until Dom0 actually starts) quite a bit; subsequent shattering
of some of the large pages may of course consume some of the saved time.

Known fallout has been spelled out here:
https://lists.xen.org/archives/html/xen-devel/2021-08/msg00781.html

There's a dependency on 'PCI: replace "secondary" flavors of
PCI_{DEVFN,BDF,SBDF}()', in particular by patch 8. Its prereq patch
still lacks an Arm ack, so it couldn't go in yet.

I'm inclined to say "of course" there are also a few seemingly unrelated
changes included here, which I just came to consider necessary or at
least desirable (in part for having been in need of adjustment for a
long time) along the way. Some of these changes are likely independent
of the bulk of the work here, and hence may be fine to go in ahead of
earlier patches.

See individual patches for details on the v4 changes.

01: AMD/IOMMU: correct potentially-UB shifts
02: IOMMU: simplify unmap-on-error in iommu_map()
03: IOMMU: add order parameter to ->{,un}map_page() hooks
04: IOMMU: have iommu_{,un}map() split requests into largest possible chunks
05: IOMMU/x86: restrict IO-APIC mappings for PV Dom0
06: IOMMU/x86: perform PV Dom0 mappings in batches
07: IOMMU/x86: support freeing of pagetables
08: AMD/IOMMU: walk trees upon page fault
09: AMD/IOMMU: return old PTE from {set,clear}_iommu_pte_present()
10: AMD/IOMMU: allow use of superpage mappings
11: VT-d: allow use of superpage mappings
12: IOMMU: fold flush-all hook into "flush one"
13: IOMMU/x86: prefill newly allocate page tables
14: x86: introduce helper for recording degree of contiguity in page tables
15: AMD/IOMMU: free all-empty page tables
16: VT-d: free all-empty page tables
17: AMD/IOMMU: replace all-contiguous page tables by superpage mappings
18: VT-d: replace all-contiguous page tables by superpage mappings
19: IOMMU/x86: add perf counters for page table splitting / coalescing
20: VT-d: fold iommu_flush_iotlb{,_pages}()
21: VT-d: fold dma_pte_clear_one() into its only caller

While not directly related (except that making this mode work properly
here was a fair part of the overall work), at this occasion I'd also
like to renew my proposal to make "iommu=dom0-strict" the default going
forward. It already is not only the default, but the only possible mode
for PVH Dom0.

Jan

Comments

Jan Beulich May 18, 2022, 12:50 p.m. UTC | #1
On 25.04.2022 10:29, Jan Beulich wrote:
> For a long time we've been rather inefficient with IOMMU page table
> management when not sharing page tables, i.e. in particular for PV (and
> further specifically also for PV Dom0) and AMD (where nowadays we never
> share page tables). While up to about 2.5 years ago AMD code had logic
> to un-shatter page mappings, that logic was ripped out for being buggy
> (XSA-275 plus follow-on).
> 
> This series enables use of large pages in AMD and Intel (VT-d) code;
> Arm is presently not in need of any enabling as pagetables are always
> shared there. It also augments PV Dom0 creation with suitable explicit
> IOMMU mapping calls to facilitate use of large pages there. Depending
> on the amount of memory handed to Dom0 this improves booting time
> (latency until Dom0 actually starts) quite a bit; subsequent shattering
> of some of the large pages may of course consume some of the saved time.
> 
> Known fallout has been spelled out here:
> https://lists.xen.org/archives/html/xen-devel/2021-08/msg00781.html
> 
> There's a dependency on 'PCI: replace "secondary" flavors of
> PCI_{DEVFN,BDF,SBDF}()', in particular by patch 8. Its prereq patch
> still lacks an Arm ack, so it couldn't go in yet.
> 
> I'm inclined to say "of course" there are also a few seemingly unrelated
> changes included here, which I just came to consider necessary or at
> least desirable (in part for having been in need of adjustment for a
> long time) along the way. Some of these changes are likely independent
> of the bulk of the work here, and hence may be fine to go in ahead of
> earlier patches.
> 
> See individual patches for details on the v4 changes.
> 
> 01: AMD/IOMMU: correct potentially-UB shifts
> 02: IOMMU: simplify unmap-on-error in iommu_map()
> 03: IOMMU: add order parameter to ->{,un}map_page() hooks
> 04: IOMMU: have iommu_{,un}map() split requests into largest possible chunks

These first 4 patches are in principle ready to go in. If only there
wasn't (sadly once again) the unclear state with comments on the
first 2 that you had given on Apr 27. I did reply verbally, and hence
I'm intending to commit these 4 by the end of the week - on the
assumption that no response to my replies means I've sufficiently
addressed the concerns - unless I hear back otherwise.

Thanks for you understanding, Jan