diff mbox series

[7/7] KVM: arm/arm64: Elide CMOs when unmapping a range

Message ID 20191213182503.14460-8-maz@kernel.org (mailing list archive)
State Not Applicable
Delegated to: Paul Burton
Headers show
Series KVM: arm/arm64: Help VMs dying quicker | expand

Commit Message

Marc Zyngier Dec. 13, 2019, 6:25 p.m. UTC
If userspace issues a munmap() on a set of pages, there is no
expectation that the pages are cleaned to the PoC. So let's
not do more work than strictly necessary, and set the magic
flag that avoids CMOs in this case.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 virt/kvm/arm/mmu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

James Morse Dec. 18, 2019, 3:07 p.m. UTC | #1
Hi Marc,

On 13/12/2019 18:25, Marc Zyngier wrote:
> If userspace issues a munmap() on a set of pages, there is no
> expectation that the pages are cleaned to the PoC.

(Pedantry: Clean and invalidate. If the guest wrote through a device mapping, we ditch any
clean+stale lines with this path, meaning swapout saves the correct values)


> So let's
> not do more work than strictly necessary, and set the magic
> flag that avoids CMOs in this case.

I think this assumes the pages went from anonymous->free, so no-one cares about the contents.

If the pages are backed by a file, won't dirty pages will still get written back before
the page is free? (e.g. EFI flash 'file' mmap()ed in)

What if this isn't the only mapping of the page? Can't it be swapped out from another VMA?
(tenuous example, poor man's memory mirroring?)


Thanks,

James
Marc Zyngier Dec. 18, 2019, 3:30 p.m. UTC | #2
Hi James,

On 2019-12-18 15:07, James Morse wrote:
> Hi Marc,
>
> On 13/12/2019 18:25, Marc Zyngier wrote:
>> If userspace issues a munmap() on a set of pages, there is no
>> expectation that the pages are cleaned to the PoC.
>
> (Pedantry: Clean and invalidate. If the guest wrote through a device
> mapping, we ditch any clean+stale lines with this path, meaning 
> swapout
> saves the correct values)

Indeed.

>> So let's
>> not do more work than strictly necessary, and set the magic
>> flag that avoids CMOs in this case.
>
> I think this assumes the pages went from anonymous->free, so no-one
> cares about the contents.
>
> If the pages are backed by a file, won't dirty pages will still get
> written back before the page is free? (e.g. EFI flash 'file' mmap()ed 
> in)

I believe so. Is that a problem?

> What if this isn't the only mapping of the page? Can't it be swapped
> out from another VMA? (tenuous example, poor man's memory mirroring?)

Swap-out wouldn't trigger this code path, as it would use a different
MMU notifier event (MMU_NOTIFY_CLEAR vs MMU_NOTIFY_UNMAP), I believe.

Thanks,

         M.
James Morse Dec. 19, 2019, 1:46 p.m. UTC | #3
Hi Marc,

On 18/12/2019 15:30, Marc Zyngier wrote:
> On 2019-12-18 15:07, James Morse wrote:
>> On 13/12/2019 18:25, Marc Zyngier wrote:
>>> If userspace issues a munmap() on a set of pages, there is no
>>> expectation that the pages are cleaned to the PoC.

>>> So let's
>>> not do more work than strictly necessary, and set the magic
>>> flag that avoids CMOs in this case.
>>
>> I think this assumes the pages went from anonymous->free, so no-one
>> cares about the contents.
>>
>> If the pages are backed by a file, won't dirty pages will still get
>> written back before the page is free? (e.g. EFI flash 'file' mmap()ed in)
> 
> I believe so. Is that a problem?

If we skipped the dcache maintenance on unmap, when the the dirty page is later reclaimed
the clean+stale lines are written back to the file. File-backed dirty pages will stick
around in the page cache in the hope someone else needs them.

This would happen for a guest:device-mapping that is written to, but is actually backed by
a mmap()d file. I think the EFI flash emulation does exactly this.


>> What if this isn't the only mapping of the page? Can't it be swapped
>> out from another VMA? (tenuous example, poor man's memory mirroring?)
> 
> Swap-out wouldn't trigger this code path, as it would use a different
> MMU notifier event (MMU_NOTIFY_CLEAR vs MMU_NOTIFY_UNMAP), I believe.

This was a half thought-through special case of the above. The sequence would be:

VM-A and VM-B both share a mapping of $page.

1. VM-A writes to $page through a device mapping
2. The kernel unmaps $page from VM-A for swap. KVM does the maintenance

3. VM-B writes to $page through a device mapping
4. VM-B exits, KVM skips the maintenance, $page may have clean+stale lines

5. Swap finds no further mappings, and writes $page and its clean+stale lines to disk.

Two VMs with a shared mapping is the 'easy' example. I think you just need a second
mapping for this to happen: it means the page isn't really free after the VM has exited.



Thanks,

James
diff mbox series

Patch

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index c55022dbac89..6749be33d822 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2056,7 +2056,13 @@  static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	unmap_stage2_range(kvm, gpa, size, 0);
+	struct mmu_notifier_range *range = data;
+	unsigned long flags = 0;
+
+	if (range->event == MMU_NOTIFY_UNMAP)
+		flags = KVM_UNMAP_ELIDE_CMO;
+
+	unmap_stage2_range(kvm, gpa, size, flags);
 	return 0;
 }
 
@@ -2067,7 +2073,7 @@  int kvm_unmap_hva_range(struct kvm *kvm, const struct mmu_notifier_range *range)
 
 	trace_kvm_unmap_hva_range(range->start, range->end);
 	handle_hva_to_gpa(kvm, range->start, range->end,
-			  &kvm_unmap_hva_handler, NULL);
+			  &kvm_unmap_hva_handler, (void *)range);
 	return 0;
 }