mbox series

[RFC,V1,0/3] selftests: KVM: sev: selftests for fd-based approach of supporting private memory

Message ID 20220524205646.1798325-1-vannapurve@google.com (mailing list archive)
Headers show
Series selftests: KVM: sev: selftests for fd-based approach of supporting private memory | expand

Message

Vishal Annapurve May 24, 2022, 8:56 p.m. UTC
This series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/

Below changes aim to test the fd based approach for guest private memory
in context of SEV/SEV-ES VMs executing on AMD SEV/SEV-ES compatible
platforms.

This series has dependency on following patch series:
1) V6 series patches from Chao mentioned above.
2) https://lore.kernel.org/all/20211210164620.11636-1-michael.roth@amd.com/T/
  - KVM: selftests: Add support for test-selectable ucall implementations
    series by Michael Roth
3) https://lore.kernel.org/kvm/20220104234129.dvpv3o3tihvzsqcr@amd.com/T/
  - KVM: selftests: Add tests for SEV and SEV-ES guests series by Michael Roth

And few additional patches:
* https://github.com/vishals4gh/linux/commit/2cb215cb6b4dff7fdf703498165179626c0cdfc7
  - Confidential platforms along with the confidentiality aware software stack
    support a notion of private/shared accesses from the confidential VMs.
    Generally, a bit in the GPA conveys the shared/private-ness of the access.
    SEV/SEV-ES implementation doesn't expose the encryption bit information
    via fault address to KVM and so this hack is still needed to signal
    private/shared access ranges to the kvm.
* https://github.com/vishals4gh/linux/commit/81a7d24231f6b8fb4174bbf97ed733688e8dbc0c

Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/sev_upm_selftests_rfc_v1

sev_priv_memfd_test.c file adds a suite of selftests to access private memory
from the SEV/SEV-ES guests via private/shared accesses and checking if the
contents can be leaked to/accessed by vmm via shared memory view.

To allow SEV/SEV-ES VMs to toggle the encryption bit during memory conversion,
support is added for mapping guest pagetables to guest va ranges and passing
the mapping information to guests via shared pages.

Vishal Annapurve (3):
  selftests: kvm: x86_64: Add support for pagetable tracking
  selftests: kvm: sev: Handle hypercall exit
  selftests: kvm: sev: Port UPM selftests onto SEV/SEV-ES VMs

 tools/testing/selftests/kvm/.gitignore        |    1 +
 tools/testing/selftests/kvm/Makefile          |    1 +
 .../selftests/kvm/include/kvm_util_base.h     |   98 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    |   81 +-
 .../selftests/kvm/lib/kvm_util_internal.h     |    9 +
 .../selftests/kvm/lib/x86_64/processor.c      |   36 +
 .../selftests/kvm/lib/x86_64/sev_exitlib.c    |   39 +-
 .../kvm/x86_64/sev_priv_memfd_test.c          | 1511 +++++++++++++++++
 8 files changed, 1770 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_priv_memfd_test.c

Comments

Michael Roth June 10, 2022, 1:05 a.m. UTC | #1
On Tue, May 24, 2022 at 08:56:43PM +0000, Vishal Annapurve wrote:
> This series implements selftests targeting the feature floated by Chao
> via:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F20220519153713.819591-1-chao.p.peng%40linux.intel.com%2F&data=05%7C01%7Cmichael.roth%40amd.com%7Cbe9cc77fc6ff4da6707808da3dc7f39c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637890226337327131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=81aPsc4zGLgPZh5A4IwKuN7AB0LLc7sNH8LYrhNMgNM%3D&reserved=0
> 
> Below changes aim to test the fd based approach for guest private memory
> in context of SEV/SEV-ES VMs executing on AMD SEV/SEV-ES compatible
> platforms.

Hi Vishal,

Thanks for posting this!

Nikunj and I have been working on a test tree with UPM support for SEV and
SEV-SNP. I hit some issues getting your selftests to work against our tree 
since some of the HC_MAP_GPA_RANGE handling for SEV was stepping on the kernel
handling you'd added for the UPM selftests.

I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the
2 modes. With UPM-mode enabled it basically means KVM can/should enforce that
all private guest pages are backed by private memslots, and enable a couple
platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on
whether or not an NPT fault is for a private page or not. SEV uses these hooks
to manage its encryption bitmap, and uses that bitmap as the authority on
whether or not a page is encrypted. SNP uses GHCB page-state-change requests
so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a
fault is private based on the page fault flags.

When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace
as before, and platform-specific hooks above are no-ops. That's the mode
your SEV self-tests ran in initially. I added a test that runs the
PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also
backed by private memslot and the platform hooks are enabled, and things seem
to still work okay there. I only added a UPM-mode test for the
PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make
sure it works exactly as it did with UPM-mode disabled, but I don't see why
it wouldn't. 

But probably worth having some discussion on how exactly we should define this
mode, and whether that meshes with what TDX folks are planning.

I've pushed my UPM-mode selftest additions here:
  https://github.com/mdroth/linux/commits/sev_upm_selftests_rfc_v1_upmmode

And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL):
  https://github.com/mdroth/linux/commits/pfdv6-on-snpv6-upm1

Thanks!

-Mike
Vishal Annapurve June 10, 2022, 9:01 p.m. UTC | #2
....
>
> I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the
> 2 modes. With UPM-mode enabled it basically means KVM can/should enforce that
> all private guest pages are backed by private memslots, and enable a couple
> platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on
> whether or not an NPT fault is for a private page or not. SEV uses these hooks
> to manage its encryption bitmap, and uses that bitmap as the authority on
> whether or not a page is encrypted. SNP uses GHCB page-state-change requests
> so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a
> fault is private based on the page fault flags.
>
> When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace
> as before, and platform-specific hooks above are no-ops. That's the mode
> your SEV self-tests ran in initially. I added a test that runs the
> PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also
> backed by private memslot and the platform hooks are enabled, and things seem
> to still work okay there. I only added a UPM-mode test for the
> PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make
> sure it works exactly as it did with UPM-mode disabled, but I don't see why
> it wouldn't.

Thanks Michael for the update. Yeah, using the bitmap to track
private/shared-ness of gfn ranges should be the better way to go as
compared to the limited approach I used to just track a single
contiguous pfn range.
I spent some time in getting the SEV/SEV-ES priv memfd selftests to
execute from private fd as well and ended up doing similar changes as
part of the github tree:
https://github.com/vishals4gh/linux/commits/sev_upm_selftests_rfc_v2.

>
> But probably worth having some discussion on how exactly we should define this
> mode, and whether that meshes with what TDX folks are planning.
>
> I've pushed my UPM-mode selftest additions here:
>   https://github.com/mdroth/linux/commits/sev_upm_selftests_rfc_v1_upmmode
>
> And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL):
>   https://github.com/mdroth/linux/commits/pfdv6-on-snpv6-upm1
>

Thanks for the references here. This helps get a clear picture around
the status of priv memfd integration with Sev-SNP VMs and this work
will be the base of future SEV specific priv memfd selftest patches as
things get more stable.

I see usage of pwrite to populate initial private memory contents.
Does it make sense to have SEV_VM_LAUNCH_UPDATE_DATA handle the
private fd population as well?
I tried to prototype it via:
https://github.com/vishals4gh/linux/commit/c85ee15c8bf9d5d43be9a34898176e8230a3b680#
as I got this suggestion from Erdem Aktas(erdemaktas@google) while
discussing about executing guest code from private fd.
Apart from the aspects I might not be aware of, this can have
performance overhead depending on the initial Guest UEFI boot memory
requirements. But this can allow the userspace VMM to keep most of the
guest vm boot memory setup the same and
avoid changing the host kernel to allow private memfd writes from userspace.

Regards,
Vishal
Michael Roth June 13, 2022, 5:49 p.m. UTC | #3
On Fri, Jun 10, 2022 at 02:01:41PM -0700, Vishal Annapurve wrote:
> ....
> >
> > I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the
> > 2 modes. With UPM-mode enabled it basically means KVM can/should enforce that
> > all private guest pages are backed by private memslots, and enable a couple
> > platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on
> > whether or not an NPT fault is for a private page or not. SEV uses these hooks
> > to manage its encryption bitmap, and uses that bitmap as the authority on
> > whether or not a page is encrypted. SNP uses GHCB page-state-change requests
> > so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a
> > fault is private based on the page fault flags.
> >
> > When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace
> > as before, and platform-specific hooks above are no-ops. That's the mode
> > your SEV self-tests ran in initially. I added a test that runs the
> > PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also
> > backed by private memslot and the platform hooks are enabled, and things seem
> > to still work okay there. I only added a UPM-mode test for the
> > PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make
> > sure it works exactly as it did with UPM-mode disabled, but I don't see why
> > it wouldn't.
> 
> Thanks Michael for the update. Yeah, using the bitmap to track
> private/shared-ness of gfn ranges should be the better way to go as
> compared to the limited approach I used to just track a single
> contiguous pfn range.
> I spent some time in getting the SEV/SEV-ES priv memfd selftests to
> execute from private fd as well and ended up doing similar changes as
> part of the github tree:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v2&data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bb3S2xOAWga8k5tsS2EHMQF5CXuKG60qy0ToeEhhQ4A%3D&reserved=0.
> 
> >
> > But probably worth having some discussion on how exactly we should define this
> > mode, and whether that meshes with what TDX folks are planning.
> >
> > I've pushed my UPM-mode selftest additions here:
> >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v1_upmmode&data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3YLZcCevIkuo5cw%2FpKk5Sf9y6%2F1ZPss6ujZtLYEbV3M%3D&reserved=0
> >
> > And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL):
> >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fpfdv6-on-snpv6-upm1&data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mW8ypNWyREtoDJ%2BHNi20OT8Hzelqk5Na8eC8ihkfCjY%3D&reserved=0
> >
> 
> Thanks for the references here. This helps get a clear picture around
> the status of priv memfd integration with Sev-SNP VMs and this work
> will be the base of future SEV specific priv memfd selftest patches as
> things get more stable.
> 
> I see usage of pwrite to populate initial private memory contents.
> Does it make sense to have SEV_VM_LAUNCH_UPDATE_DATA handle the
> private fd population as well?
> I tried to prototype it via:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommit%2Fc85ee15c8bf9d5d43be9a34898176e8230a3b680%23&data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QwP4JioniC06yFV7c%2BY35LtqJy9INGlcQ9Z6gn3nOrI%3D&reserved=0

Thanks for the pointer and for taking a stab at this approach (hadn't
realized you were looking into this so sorry for the overlap with your
code).

> as I got this suggestion from Erdem Aktas(erdemaktas@google) while
> discussing about executing guest code from private fd.

The way we way have the host patches implemented currently is sort of based
around the idea that userspace handles all private/shared conversion via
allocations/deallocations from the private backing store, since I
thought that was one of the design goals. For SNP that means allocating a
page from backing store will trigger the additional hooks in the kernel needed
to do some additional bookkeeping like RMP updates and removing from directmap,
which I'm doing via a platform-specific callback I've added to the KVM memfile
notifier callback.

There was some talk of allowing a sort of pre-boot stage to the
MFD_INACCESSIBLE protections where writes would be allowed up until a
certain point. The kernel hack to allow pwrite() was sort of a holdover
for this support.

Handling pre-population as part of SNP_LAUNCH_UPDATE seems sort of
incompatible with this, since it reads from shared memory and writes
into private memory. So either:

a) userspace pre-allocated the private backing page before calling
   SNP_LAUNCH_UPDATE to fill it, in which case the kernel is mapping
   private memory into it's address space that is now already
   guest-owned which will cause an RMP fault, or

b) userspace lets SNP_LAUNCH_UPDATE allocate the private page as part of
   copying over the data from shared page, in which case we'd get the
   invalidation notifier callback going through the normal shmem
   allocation path, and would need to bypass this to ensure that
   notifier trigger after the memory has been populated.

Maybe some other sort of notifier to handle the RMP/directmap changes
would avoid these issues, but that seems to move us closer to just
having a KVM ioctl to handle the conversions and manage this
book-keeping via KVM ioctls rather than mem FD callbacks/notifiers.
There seems to be some discussion around doing something of this sort
but still need to get some clarity on this.

> Apart from the aspects I might not be aware of, this can have
> performance overhead depending on the initial Guest UEFI boot memory
> requirements. But this can allow the userspace VMM to keep most of the
> guest vm boot memory setup the same and
> avoid changing the host kernel to allow private memfd writes from userspace.

I think it would be good if we could reduce the complexity on the VMM
side. Having a KVM ioctl that instructs KVM to convert a range of GPAs
between private<->shared via private memslot/FD would also achieve that,
while also avoiding needing special handling for this pre-launch case
vs. conversions during run-time. Probably worth discussing more in
Chao's thread where there's some related discussion.

Thanks,

Mike

> 
> Regards,
> Vishal
Michael Roth June 13, 2022, 7:35 p.m. UTC | #4
On Mon, Jun 13, 2022 at 12:49:28PM -0500, Michael Roth wrote:
> On Fri, Jun 10, 2022 at 02:01:41PM -0700, Vishal Annapurve wrote:
> > ....
> > >
> > > I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the
> > > 2 modes. With UPM-mode enabled it basically means KVM can/should enforce that
> > > all private guest pages are backed by private memslots, and enable a couple
> > > platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on
> > > whether or not an NPT fault is for a private page or not. SEV uses these hooks
> > > to manage its encryption bitmap, and uses that bitmap as the authority on
> > > whether or not a page is encrypted. SNP uses GHCB page-state-change requests
> > > so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a
> > > fault is private based on the page fault flags.
> > >
> > > When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace
> > > as before, and platform-specific hooks above are no-ops. That's the mode
> > > your SEV self-tests ran in initially. I added a test that runs the
> > > PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also
> > > backed by private memslot and the platform hooks are enabled, and things seem
> > > to still work okay there. I only added a UPM-mode test for the
> > > PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make
> > > sure it works exactly as it did with UPM-mode disabled, but I don't see why
> > > it wouldn't.
> > 
> > Thanks Michael for the update. Yeah, using the bitmap to track
> > private/shared-ness of gfn ranges should be the better way to go as
> > compared to the limited approach I used to just track a single
> > contiguous pfn range.
> > I spent some time in getting the SEV/SEV-ES priv memfd selftests to
> > execute from private fd as well and ended up doing similar changes as
> > part of the github tree:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v2&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2Bb3S2xOAWga8k5tsS2EHMQF5CXuKG60qy0ToeEhhQ4A%3D&amp;reserved=0.
> > 
> > >
> > > But probably worth having some discussion on how exactly we should define this
> > > mode, and whether that meshes with what TDX folks are planning.
> > >
> > > I've pushed my UPM-mode selftest additions here:
> > >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v1_upmmode&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3YLZcCevIkuo5cw%2FpKk5Sf9y6%2F1ZPss6ujZtLYEbV3M%3D&amp;reserved=0
> > >
> > > And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL):
> > >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fpfdv6-on-snpv6-upm1&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=mW8ypNWyREtoDJ%2BHNi20OT8Hzelqk5Na8eC8ihkfCjY%3D&amp;reserved=0
> > >
> > 
> > Thanks for the references here. This helps get a clear picture around
> > the status of priv memfd integration with Sev-SNP VMs and this work
> > will be the base of future SEV specific priv memfd selftest patches as
> > things get more stable.
> > 
> > I see usage of pwrite to populate initial private memory contents.
> > Does it make sense to have SEV_VM_LAUNCH_UPDATE_DATA handle the
> > private fd population as well?
> > I tried to prototype it via:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommit%2Fc85ee15c8bf9d5d43be9a34898176e8230a3b680%23&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=QwP4JioniC06yFV7c%2BY35LtqJy9INGlcQ9Z6gn3nOrI%3D&amp;reserved=0
> 
> Thanks for the pointer and for taking a stab at this approach (hadn't
> realized you were looking into this so sorry for the overlap with your
> code).
> 
> > as I got this suggestion from Erdem Aktas(erdemaktas@google) while
> > discussing about executing guest code from private fd.
> 
> The way we way have the host patches implemented currently is sort of based
> around the idea that userspace handles all private/shared conversion via
> allocations/deallocations from the private backing store, since I
> thought that was one of the design goals. For SNP that means allocating a
> page from backing store will trigger the additional hooks in the kernel needed
> to do some additional bookkeeping like RMP updates and removing from directmap,
> which I'm doing via a platform-specific callback I've added to the KVM memfile
> notifier callback.
> 
> There was some talk of allowing a sort of pre-boot stage to the
> MFD_INACCESSIBLE protections where writes would be allowed up until a
> certain point. The kernel hack to allow pwrite() was sort of a holdover
> for this support.
> 
> Handling pre-population as part of SNP_LAUNCH_UPDATE seems sort of
> incompatible with this, since it reads from shared memory and writes
> into private memory.

Well, no, it wouldn't be, since your code handles it the same way as
mine, where kvm_private_mem_get_pfn() allocates a page, but doesn't
generate a notifier event, so you can defer things like RMP updates
until after the memory is populated. So this might be a reasonable
approach as well. But still worth exploring if a more general KVM ioctl
is the better approach.